Lidia Pivovarova | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lidia Pivovarova is active.

Explore More

Publication

Featured researches published by Lidia Pivovarova.

artificial intelligence and natural language | 2017

ParaPhraser: Russian paraphrase corpus and shared task

Lidia Pivovarova; Ekaterina V. Pronoza; Elena Yagunova; Anton Pronoza

The paper describes the results of the First Russian Paraphrase Detection Shared Task held in St.-Petersburg, Russia, in October 2016. Research in the area of paraphrase extraction, detection and generation has been successfully developing for a long time while there has been only a recent surge of interest towards the problem in the Russian community of computational linguistics. We try to overcome this gap by introducing the project ParaPhraser.ru dedicated to the collection of Russian paraphrase corpus and organizing a Paraphrase Detection Shared Task, which uses the corpus as the training data. The participants of the task applied a wide variety of techniques to the problem of paraphrase detection, from rule-based approaches to deep learning, and results of the task reflect the following tendencies: the best scores are obtained by the strategy of using traditional classifiers combined with fine-grained linguistic features, however, complex neural networks, shallow methods and purely technical methods also demonstrate competitive results.

International Conference on Statistical Language and Speech Processing | 2014

Supervised Classification Using Balanced Training

Mian Du; Matthew Pierce; Lidia Pivovarova; Roman Yangarber

We examine supervised learning for multi-class, multi-label text classification. We are interested in exploring classification in a real-world setting, where the distribution of labels may change dynamically over time. First, we compare the performance of an array of binary classifiers trained on the label distribution found in the original corpus against classifiers trained on balanced data, where we try to make the label distribution as nearly uniform as possible. We discuss the performance trade-offs between balanced vs. unbalanced training, and highlight the advantages of balancing the training set. Second, we compare the performance of two classifiers, Naive Bayes and SVM, with several feature-selection methods, using balanced training. We combine a Named-Entity-based rote classifier with the statistical classifiers to obtain better performance than either method alone.

SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing | 2013

MDL-based models for transliteration generation

Javad Nouri; Lidia Pivovarova; Roman Yangarber

This paper presents models for automatic transliteration of proper names between languages that use different alphabets. The models are an extension of our work on automatic discovery of patterns of etymological sound change, based on the Minimum Description Length Principle. The models for pairwise alignment are extended with algorithms for prediction that produce transliterated names. We present results on 13 parallel corpora for 7 languages, including English, Russian, and Farsi, extracted from Wikipedia headlines. The transliteration corpora are released for public use. The models achieve up to 88% on word-level accuracy and up to 99% on symbol-level F-score. We discuss the results from several perspectives, and analyze how corpus size, the language pair, the type of names (persons, locations), and noise in the data affect the performance.

international conference on computational linguistics | 2010

Ontological parsing of encyclopedia information

Victor Bocharov; Lidia Pivovarova; Valery Rubashkin; Boris Chuprin

Semi-automatic ontology learning from encyclopedia is presented with primary focus on syntax and semantic analyses of definitions.

applications of natural language to data bases | 2015

Improving Supervised Classification Using Information Extraction

Mian Du; Matthew Pierce; Lidia Pivovarova; Roman Yangarber

We explore supervised learning for multi-class, multi-label text classification, focusing on real-world settings, where the distribution of labels changes dynamically over time. We use the PULS Information Extraction system to collect information about the distribution of class labels over named entities found in text. We then combine a knowledge-based rote classifier with statistical classifiers to obtain better performance than either classification method alone. The resulting classifier yields a significant improvement in macro-averaged F-measure compared to the state of the art, while maintaining comparable micro-average.

Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing | 2017

The First Cross-Lingual Challenge on Recognition, Normalization, and Matching of Named Entities in Slavic Languages

Jakub Piskorski; Lidia Pivovarova; Jan Šnajder; Josef Steinberger; Roman Yangarber

This paper describes the outcomes of the first challenge on multilingual named entity recognition that aimed at recognizing mentions of named entities in web documents in Slavic languages, their normalization/lemmatization, and cross-language matching. It was organised in the context of the 6th Balto-Slavic Natural Language Processing Workshop, co-located with the EACL 2017 conference. Although eleven teams signed up for the evaluation, due to the complexity of the task(s) and short time available for elaborating a solution, only two teams submitted results on time. The reported evaluation figures reflect the relatively higher level of complexity of named entity-related tasks in the context of processing texts in Slavic languages. Since the duration of the challenge goes beyond the date of the publication of this paper and updated picture of the participating systems and their corresponding performance can be found on the web page of the challenge.

european conference on information retrieval | 2016

Tracking Interactions Across Business News, Social Media, and Stock Fluctuations

Ossi Karkulahti; Lidia Pivovarova; Mian Du; Jussi Kangasharju; Roman Yangarber

In this paper we study the interactions between how companies are mentioned in news, their presence on social media, and daily fluctuation in their stock prices. Our experiments demonstrate that for some entities these time series can be correlated in interesting ways, though for others the correspondences are more opaque. In this study, social media presence is measured by counting Wikipedia page hits. This work is done in a context of building a system for aggregating and analyzing news text, which aims to help the user track business trends; we show results obtainable by the system.

artificial intelligence and natural language | 2015

Preface of AINL-ISMW FRUCT conference proceedings

Sergey Balandin; Maxim Buzdalov; Tatiana Lando; Lidia Pivovarova; Svetlana Popova; Dmitry Ustalov; Jan Zizka

We welcome you to the Artificial Intelligence and Natural Language & Information Extraction, Social Media and Web Search (AINL-ISMW) FRUCT Conference. This is the first time when these conferences and international school are organized together in the beautiful city of Saint-Petersburg. All events of the conference are hosted on the ground of Saint-Petersburg State University and ITMO University, which are both known as regional leaders in IT and ICT, with long history and strong scientific schools, as well as strong traditions of cooperation with universities all around the globe.

workshop on events: definition, detection, coreference, and representation | 2013