Liana Ermakova | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Liana Ermakova is active.

Explore More

Publication

Featured researches published by Liana Ermakova.

International Workshop of the Initiative for the Evaluation of XML Retrieval | 2011

IRIT at INEX: Question Answering Task

Liana Ermakova; Josiane Mothe

In this paper we describe an approach for tweet contextualization developed in the context of the INEX question answering track. The task is to provide a context up to 500 words to a tweet. The summary should be an extract from the Wikipedia. Our approach is based on the index which includes not only lemmas, but also named entities (NE). Sentence retrieval is based on standard TF-IDF measure enriched by named entity recognition, part-of-speech (POS) weighting and smoothing from local context. The method has been ranked first in the INEX QA track according to content evaluation.

cross language evaluation forum | 2015

A Method for Short Message Contextualization: Experiments at CLEF/INEX

Liana Ermakova

This paper presents the approach we developed for automatic multi-document summarization applied to short message contextualization, in particular to tweet contextualization. The proposed method is based on named entity recognition, part-of-speech weighting and sentence quality measuring. In contrast to previous research, we introduced an algorithm from smoothing from the local context. Our approach exploits topic-comment structure of a text. Moreover, we developed a graph-based algorithm for sentence reordering. The method has been evaluated at INEX/CLEF tweet contextualization track. We provide the evaluation results over the 4 years of the track. The method was also adapted to snippet retrieval and query expansion. The evaluation results indicate good performance of the approach.

cross language evaluation forum | 2017

CLEF 2017 Microblog Cultural Contextualization Lab Overview

Liana Ermakova; Lorraine Goeuriot; Josiane Mothe; Philippe Mulhem; Jian-Yun Nie; Eric SanJuan

MC2 CLEF 2017 lab deals with how cultural context of a microblog affects its social impact at large. This involves microblog search, classification, filtering, language recognition, localization, entity extraction, linking open data, and summarization. Regular Lab participants have access to the private massive multilingual microblog stream of The Festival Galleries project. Festivals have a large presence on social media. The resulting mircroblog stream and related URLs is appropriate to experiment advanced social media search and mining methods. A collection of 70,000,000 microblogs over 18 months dealing with cultural events in all languages has been released to test multilingual content analysis and microblog search. For content analysis topics were in any language and results were expected in four languages: English, Spanish, French, and Portuguese. For microblog search topics were in four languages: Arabic, English, French and Spanish, and results were expected in any language.

acm symposium on applied computing | 2016

Proximity relevance model for query expansion

Liana Ermakova; Josiane Mothe; Elena Nikitina

Query expansion (QE) aims at improving information retrieval effectiveness by enhancing the query formulation. Because users queries are generally short and because of the language ambiguity, some information needs are difficult to satisfy. Query reformulation and QE methods have been developed to face this issue. Pseudo relevance feedback (PRF) considers the top retrieved documents as relevant and uses their content in order to expand the initial query. Rather than considering feedback documents as a bag of words, it is possible to exploit term proximity information. Although there are some researches in this direction, the majority of them is empirical. The lack of theoretical works in this area motivated us to introduce a novel method integrated into the language model formalism that takes advantage of the remoteness of candidate terms for QE from query terms within feedback documents. In contrast to previous works, our approach captures the proximity directly and in terms of sentences rather than tokens. We show that the method significantly improves the retrieval performance on TREC collections especially for difficult queries.

Frontiers in Research Metrics and Analytics | 2018

Is the Abstract a Mere Teaser? Evaluating Generosity of Article Abstracts in the Environmental Sciences

Liana Ermakova; Frédérique Bordignon; Nicolas Turenne; Marianne Noel

An abstract is not only a mirror of the full article; it also aims to draw attention to the most important information of the document it summarizes. Many studies have compared abstracts with full texts for their informativeness. In contrast to previous studies, we propose to investigate this relation based not only on the amount of information given by the abstract but also on its importance. The main objective of this paper is to introduce a new metric called GEM to measure the generosity or representativeness of an abstract. Schematically speaking, a generous abstract should have the best possible score of similarity for the sections important to the reader. Based on a questionnaire gathering information from 630 researchers, we were able to weight sections according to their importance. In our approach, seven sections were first automatically detected in the full text. The accuracy of this classification into sections was above 80% compared with a dataset of documents where sentences were assigned to sections by experts. Second, each section was weighted according to the questionnaire results. The GEM score was then calculated as a sum of weights of sections in the full text corresponding to sentences in the abstract normalized over the total sum of weights of sections in the full text. The correlation between GEM score and the mean of the scores assigned by annotators was higher than the correlation between scores from different experts. As a case study, the GEM score was calculated for 36,237 articles in environmental sciences (1930–2013) retrieved from the French ISTEX database. The main result was that GEM score has increased over time. Moreover, this trend depends on subject area and publisher. No correlation was found between GEM score and citation rate or open access status of articles. We conclude that abstracts are more generous in recent publications and cannot be considered as mere teasers. This research should be pursued in greater depth, particularly by examining structured abstracts. GEM score could be a valuable indicator for exploring large numbers of abstracts, by guiding the reader in his/her choice of whether or not to obtain and read full texts.

international acm sigir conference on research and development in information retrieval | 2017

A Metric for Sentence Ordering Assessment Based on Topic-Comment Structure

Liana Ermakova; Josiane Mothe; Anton Firsov

Sentence ordering (SO) is a key component of verbal ability. It is also crucial for automatic text generation. While numerous researchers developed various methods to automatically evaluate the informativeness of the produced contents, the evaluation of readability is usually performed manually. In contrast to that, we present a self-sufficient metric for SO assessment based on text topic-comment structure. We show that this metric has high accuracy.

CLEF (Online Working Notes/Labs/Workshop) | 2012

IRIT at INEX 2012: Tweet Contextualization

Liana Ermakova; Josiane Mothe

CLEF (Working Notes) | 2016

Cultural micro-blog Contextualization 2016 Workshop Overview: data and pilot tasks.

Liana Ermakova; Lorraine Goeuriot; Josiane Mothe; Philippe Mulhem; Jian-Yun Nie; Eric SanJuan

INitiative for the Evaluation of XML Retrieval (INEX), part of : Conference on Multilingual and Multimodal Information Access Evaluation (CLEF 2013) | 2013