Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ekaterina V. Pronoza is active.

Publication


Featured researches published by Ekaterina V. Pronoza.


Russian Summer School in Information Retrieval | 2015

Construction of a Russian Paraphrase Corpus: Unsupervised Paraphrase Extraction

Ekaterina V. Pronoza; Elena Yagunova; Anton Pronoza

This paper presents a crowdsourcing project on the creation of a publicly available corpus of sentential paraphrases for Russian. Collected from the news headlines, such corpus could be applied for information extraction and text summarization. We collect news headlines from different agencies in real-time; paraphrase candidates are extracted from the headlines using an unsupervised matrix similarity metric. We provide user-friendly online interface for crowdsourced annotation which is available at paraphraser.ru. There are 5181 annotated sentence pairs at the moment, with 4758 of them included in the corpus. The annotation process is going on and the current version of the corpus is freely available at http://paraphraser.ru.


mexican international conference on artificial intelligence | 2015

Low-Level Features for Paraphrase Identification

Ekaterina V. Pronoza; Elena Yagunova

This paper deals with the task of sentential paraphrase identification. We work with Russian but our approach can be applied to any other language with rich morphology and free word order. As part of our ParaPhraser.ru project, we construct a paraphrase corpus and then experiment with supervised methods of paraphrase identification. In this paper we focus on the low-level string, lexical and semantic features which unlike complex deep ones do not cause information noise and can serve as a solid basis for the development of an effective paraphrase identification system. Results of the experiments show that the features introduced in this paper improve the paraphrase identification model based solely on the standard low-level features or the optimized matrix metric used for corpus construction.


artificial intelligence and natural language | 2015

Comparison of sentence similarity measures for Russian paraphrase identification

Ekaterina V. Pronoza; Elena Yagunova

In this paper we analyze and compare different types of sentence similarity measures applied to the problem of sentential paraphrase identification. We work with Russian, and all the experiments are conducted on the Russian paraphrase corpus we have collected from the news headlines (and are collecting at the moment). Apart from the similarity measures, we also analyze the corpus itself. As a result of the research we disprove the supposition that it is more difficult to distinguish between precise and loose paraphrases than between loose paraphrases and non-paraphrases. We also come up with the recommendations for the application of different similarity measures to identifying paraphrases derived from the news texts.


artificial intelligence and natural language | 2017

ParaPhraser: Russian paraphrase corpus and shared task

Lidia Pivovarova; Ekaterina V. Pronoza; Elena Yagunova; Anton Pronoza

The paper describes the results of the First Russian Paraphrase Detection Shared Task held in St.-Petersburg, Russia, in October 2016. Research in the area of paraphrase extraction, detection and generation has been successfully developing for a long time while there has been only a recent surge of interest towards the problem in the Russian community of computational linguistics. We try to overcome this gap by introducing the project ParaPhraser.ru dedicated to the collection of Russian paraphrase corpus and organizing a Paraphrase Detection Shared Task, which uses the corpus as the training data. The participants of the task applied a wide variety of techniques to the problem of paraphrase detection, from rule-based approaches to deep learning, and results of the task reflect the following tendencies: the best scores are obtained by the strategy of using traditional classifiers combined with fine-grained linguistic features, however, complex neural networks, shallow methods and purely technical methods also demonstrate competitive results.


mexican international conference on artificial intelligence | 2013

The Use of Horizontal Visibility Graphs to Identify the Words that Define the Informational Structure of a Text

Dmitry V. Lande; Andrey Snarskii; Elena Yagunova; Ekaterina V. Pronoza

A compactified horizontal visibility graph for the language network and identification of the words that define the informational structure of a text is proposed. It was found that the networks constructed in such a way are scale free, and have a property that among the nodes with largest degrees there are words that determine not only communicative text structure, but also its informational structure.


mexican international conference on artificial intelligence | 2016

Sentence Paraphrase Graphs: Classification Based on Predictive Models or Annotators’ Decisions?

Ekaterina V. Pronoza; Elena Yagunova; Nataliya Kochetkova

As part of our project ParaPhraser on the identification and classification of Russian paraphrase, we have collected a corpus of more than 8000 sentence pairs annotated as precise, loose or non-paraphrases. The corpus is annotated via crowdsourcing by naive native Russian speakers, but from the point of view of the expert, our complex paraphrase detection model can be more successful at predicting paraphrase class than a naive native speaker.


International Conference on Statistical Language and Speech Processing | 2014

Corpus-Based Information Extraction and Opinion Mining for the Restaurant Recommendation System

Ekaterina V. Pronoza; Elena Yagunova; Svetlana Volskaya

In this paper corpus-based information extraction and opinion mining method is proposed. Our domain is restaurant reviews, and our information extraction and opinion mining module is a part of a Russian knowledge-based recommendation system.


language and technology conference | 2013

Aspect-Based Restaurant Information Extraction for the Recommendation System

Ekaterina V. Pronoza; Elena Yagunova; Svetlana Volskaya

In this paper information extraction task for the restaurant recommendation system is considered. We develop an information extraction system which is intended to gather restaurants aspects from users’ reviews and output them to the recommendation module. As many of the restaurant aspects are subjective, our task can also be called sentiment analysis, or opinion mining. Thus, we present an aspect-based approach towards sentiment analysis of reviews about restaurants for e-tourism recommender systems. The analyzed frames are service and food quality, cuisine, price level, noise level, etc. In this paper we focus on service quality, cuisine type and food quality. As part of the preprocessing phase, a method for Russian reviews corpus analysis (as part of information extraction) is proposed. Its importance is shown at the experimental phase, when the application of machine learning techniques to aspects extraction is analyzed. It is shown that the information obtained during corpus analysis improve system performance. We conduct experiments with several feature sets and classifiers and show that the use of resources learnt from the corpus leads to the improvement of the models. Naive Bayes appears to be the best choice for sentiment classification, while Logistic Regression and SVM are best at deciding on the relevance of a review with respect to the particular aspect.


social informatics | 2018

News Headline as a Form of News Text Compression

Nataliya Kochetkova; Ekaterina V. Pronoza; Elena Yagunova

In this paper we analyze news text collections (clusters) via extracting their paraphrase headlines into a paraphrase graph and working with this graph. Our aim is to test whether news headline is an appropriate form of news text compression. Different types of news collections: dynamic, static and combined (both dynamic and static) clusters are analyzed and it is shown that their respective paraphrase graphs reflect the characteristics of the texts. We also automatically extract the most informationally important linked fragments of news texts, and these fragments characterize news texts as either informative, conveying some information, or publicistic ones, trying to affect the readers emotionally. It is shown that news headlines of the informative type do represent their respective compressed news reports.


conference on intelligent text processing and computational linguistics | 2016

A New Russian Paraphrase Corpus. Paraphrase Identification and Classification Based on Different Prediction Models

Ekaterina V. Pronoza; Elena Yagunova

Our main objectives are constructing a paraphrase corpus for Russian and developing of the paraphrase identification and classification models based on this corpus. The corpus consists of pairs of news headlines from different media agencies which are extracted and analyzed in real time. Paraphrase candidates are extracted using an unsupervised matrix similarity metric: if the metric value satisfies a certain threshold, the corresponding pair of sentences is included in the corpus. These pairs of sentences are further annotated via crowdsourcing. We provide a user-friendly online interface for crowdsourced annotation which is available at http://paraphraser.ru. There are 7480 annotated sentence pairs in the corpus at the moment, and there are still more to come. The types and the features of these sentence pairs are not introduced to the annotators. We adopt a 3-classes classification of paraphrases and distinguish precise paraphrases (conveying the same meaning), loose paraphrases (conveying similar meaning) and non-paraphrases (conveying different meaning).

Collaboration


Dive into the Ekaterina V. Pronoza's collaboration.

Top Co-Authors

Avatar

Elena Yagunova

Saint Petersburg State University

View shared research outputs
Top Co-Authors

Avatar

Svetlana Volskaya

Saint Petersburg State University

View shared research outputs
Top Co-Authors

Avatar

Anton Pronoza

Russian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Olga Makarova

Saint Petersburg State University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge