Natalia V. Loukachevitch
Moscow State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Natalia V. Loukachevitch.
north american chapter of the association for computational linguistics | 2016
Maria Pontiki; Dimitris Galanis; Haris Papageorgiou; Ion Androutsopoulos; Suresh Manandhar; Mohammad Al-Smadi; Mahmoud Al-Ayyoub; Yanyan Zhao; Bing Qin; Orphée De Clercq; Veronique Hoste; Marianna Apidianaki; Xavier Tannier; Natalia V. Loukachevitch; Evgeniy Kotelnikov; Núria Bel; Salud María Jiménez-Zafra; Gülşen Eryiğit
This paper describes the SemEval 2016 shared task on Aspect Based Sentiment Analysis (ABSA), a continuation of the respective tasks of 2014 and 2015. In its third year, the task provided 19 training and 20 testing datasets for 8 languages and 7 domains, as well as a common evaluation procedure. From these datasets, 25 were for sentence-level and 14 for text-level ABSA; the latter was introduced for the first time as a subtask in SemEval. The task attracted 245 submissions from 29 teams.
meeting of the association for computational linguistics | 2014
Ilia Chetviorkin; Natalia V. Loukachevitch
In this study we explore a novel technique for creation of polarity lexicons from the Twitter streams in Russian and English. With this aim we make preliminary filtering of subjective tweets using general domain-independent lexicons in each language. Then the subjective tweets are used for extraction of domain-specific sentiment words. Relying on co-occurrence statistics of extracted words in a large unlabeled Twitter collections we utilize the Markov random field framework for the word polarity classification. To evaluate the quality of the obtained sentiment lexicons they are used for tweet sentiment classification and outperformed previous results.
arXiv: Computation and Language | 2016
Alexander Panchenko; Dmitry Ustalov; Nikolay Arefyev; Denis Paperno; Natalia Konstantinova; Natalia V. Loukachevitch; Chris Biemann
Semantic relatedness of terms represents similarity of meaning by a numerical score. On the one hand, humans easily make judgements about semantic relatedness. On the other hand, this kind of information is useful in language processing systems. While semantic relatedness has been extensively studied for English using numerous language resources, such as associative norms, human judgements and datasets generated from lexical databases, no evaluation resources of this kind have been available for Russian to date. Our contribution addresses this problem. We present five language resources of different scale and purpose for Russian semantic relatedness, each being a list of triples \(({word}_{i}, {word}_{j}, {similarity}_{ij}\)). Four of them are designed for evaluation of systems for computing semantic relatedness, complementing each other in terms of the semantic relation type they represent. These benchmarks were used to organise a shared task on Russian semantic relatedness, which attracted 19 teams. We use one of the best approaches identified in this competition to generate the fifth high-coverage resource, the first open distributional thesaurus of Russian. Multiple evaluations of this thesaurus, including a large-scale crowdsourcing study involving native speakers, indicate its high accuracy.
north american chapter of the association for computational linguistics | 2015
Michael Nokel; Natalia V. Loukachevitch
The paper describes the results of an empirical study of integrating bigram collocations and similarities between them and unigrams into topic models. First of all, we propose a novel algorithm PLSA-SIM that is a modification of the original algorithm PLSA. It incorporates bigrams and maintains relationships between unigrams and bigrams based on their component structure. Then we analyze a variety of word association measures in order to integrate top-ranked bigrams into topic models. All experiments were conducted on four text collections of different domains and languages. The experiments distinguish a subgroup of tested measures that produce topranked bigrams, which demonstrate significant improvement of topic models quality for all collections, when integrated into PLSASIM algorithm.
meeting of the association for computational linguistics | 2016
Michael Nokel; Natalia V. Loukachevitch
The paper presents an empirical study of integrating ngrams and multi-word terms into topic models, while maintaining similarities between them and words based on their component structure. First, we adapt the PLSA-SIM algorithm to the more widespread LDA model and ngrams. Then we propose a novel algorithm LDA-ITER that allows the incorporation of the most suitable ngrams into topic models. The experiments of integrating ngrams and multiword terms conducted on five text collections in different languages and domains demonstrate a significant improvement in all the metrics under consideration.
european conference on information retrieval | 2013
Elena I. Bolshakova; Natalia V. Loukachevitch; Michael Nokel
The paper describes the results of an experimental study of topic models applied to the task of single-word term extraction. The experiments encompass several probabilistic and non-probabilistic topic models and demonstrate that topic information improves the quality of term extraction, as well as NMF with KL-divergence minimization is the best among the models under study.
International Conference on Analysis of Images, Social Networks and Texts | 2016
Valerie Mozharova; Natalia V. Loukachevitch
Current machine-learning approaches for information extraction often include features based on large volumes of knowledge in form of gazetteers, word clusters, etc. In this paper we consider a CRF-based approach for Russian named entity recognition based on multiple lexicons. We test our system on the open Russian collections “Persons-1000” and “Persons-1111” labeled with personal names. We additionally annotated the collection “Persons-1000” with names of organizations, media, locations, and geo-political entities and present the results of our experiments for one type of names (Persons) for comparison purposes, for three types (Persons, Organizations, and Locations), and five types of names. We also compare two types of labeling schemes for Russian: IO-scheme and BIO-scheme.
2016 International FRUCT Conference on Intelligence, Social Media and Web (ISMW FRUCT) | 2016
Valerie Mozharova; Natalia V. Loukachevitch
In this article we consider a two-stage prediction approach for named entity recognition in Russian. In the first stage, named entities are extracted by a machine learning method. After that our system collects the statistics of token classes and transforms this statistics to a feature set, which is used for training a new classifier. We consider three types of the two-stage features: the previous history, the whole document statistics, and global statistics of the whole collection. We carry out our experiments on several text collections. We show that the utilizing of the two-stage prediction approach improves the quality of named entity recognition.
text speech and dialogue | 2015
Natalia V. Loukachevitch; Yuliya Rubtsova
This paper summarizes the results of the reputation-oriented Twitter task, which was held as part of SentiRuEval evaluation of Russian sentiment-analysis systems. The tweets in two domains: telecom companies and banks - were included in the evaluation. The task was to determine if an author of a tweet has a positive or negative attitude to a company mentioned in the message. The main issue of this paper is to analyze the current state and problems of approaches applied by the participants.
cross language evaluation forum | 2005
Mikhail Ageev; Boris V. Dobrov; Natalia V. Loukachevitch
In CLEF 2005 experiments we used a bilingual Russian-English Socio-Political Thesaurus that we developed over more than 10 years as a tool for automatic text processing in information retrieval tasks. The same resource and the same algorithms were used for the ad-hoc and domain–specific task.