Dmitry Ustalov | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dmitry Ustalov is active.

Explore More

Publication

Featured researches published by Dmitry Ustalov.

conference of the european chapter of the association for computational linguistics | 2014

A Spinning Wheel for YARN: User Interface for a Crowdsourced Thesaurus

Pavel Braslavski; Dmitry Ustalov; Mikhail Mukhin

YARN (Yet Another RussNet) project started in 2013 aims at creating a large open thesaurus for Russian using crowdsourcing. This paper describes synset assembly interface developed within the project — motivation behind it, design, usage scenarios, implementation details, and first experimental results.

meeting of the association for computational linguistics | 2017

Watset : automatic induction of synsets from a graph of synonyms

Dmitry Ustalov; Alexander Panchenko; Chris Biemann

This paper presents a new graph-based approach that induces synsets using synonymy dictionaries and word embeddings. First, we build a weighted graph of synonyms extracted from commonly available resources, such as Wiktionary. Second, we apply word sense induction to deal with ambiguous words. Finally, we cluster the disambiguated version of the ambiguous input graph into synsets. Our meta-clustering approach lets us use an efficient hard clustering algorithm to perform a fuzzy clustering of the graph. Despite its simplicity, our approach shows excellent results, outperforming five competitive state-of-the-art methods in terms of F-score on three gold standard datasets for English and Russian derived from large-scale manually constructed lexical resources.

arXiv: Computation and Language | 2016

Human and Machine Judgements for Russian Semantic Relatedness

Alexander Panchenko; Dmitry Ustalov; Nikolay Arefyev; Denis Paperno; Natalia Konstantinova; Natalia V. Loukachevitch; Chris Biemann

Semantic relatedness of terms represents similarity of meaning by a numerical score. On the one hand, humans easily make judgements about semantic relatedness. On the other hand, this kind of information is useful in language processing systems. While semantic relatedness has been extensively studied for English using numerous language resources, such as associative norms, human judgements and datasets generated from lexical databases, no evaluation resources of this kind have been available for Russian to date. Our contribution addresses this problem. We present five language resources of different scale and purpose for Russian semantic relatedness, each being a list of triples \(({word}_{i}, {word}_{j}, {similarity}_{ij}\)). Four of them are designed for evaluation of systems for computing semantic relatedness, complementing each other in terms of the semantic relation type they represent. These benchmarks were used to organise a shared task on Russian semantic relatedness, which attracted 19 teams. We use one of the best approaches identified in this competition to generate the fifth high-coverage resource, the first open distributional thesaurus of Russian. Multiple evaluations of this thesaurus, including a large-scale crowdsourcing study involving native speakers, indicate its high accuracy.

empirical methods in natural language processing | 2017

Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation

Alexander Panchenko; Fide Marten; Eugen Ruppert; Stefano Faralli; Dmitry Ustalov; Simone Paolo Ponzetto; Chris Biemann

Interpretability of a predictive model is a powerful feature that gains the trust of users in the correctness of the predictions. In word sense disambiguation (WSD), knowledge-based systems tend to be much more interpretable than knowledge-free counterparts as they rely on the wealth of manually-encoded elements representing word senses, such as hypernyms, usage examples, and images. We present a WSD system that bridges the gap between these two so far disconnected groups of methods. Namely, our system, providing access to several state-of-the-art WSD models, aims to be interpretable as a knowledge-based system while it remains completely unsupervised and knowledge-free. The presented tool features a Web interface for all-word disambiguation of texts that makes the sense predictions human readable by providing interpretable word sense inventories, sense representations, and disambiguation results. We provide a public API, enabling seamless integration.

artificial intelligence and natural language | 2015

An information retrieval system for technology analysis and forecasting

Nikita Nikitinsky; Dmitry Ustalov; Sergey Shashev

Expert evaluation of grant proposals and research projects is often facilitated by specialized decision support systems, which analyze research and industry trends in a large domain-dependent text corpus. Despite that there exist production-grade technological forecasting systems for English, Russian patent databases and citation indexes had been developed isolated from the global ones. This complicates technology analysis and forecasting in research conducted in Russia. In this paper, we present a scientific information retrieval system designed for the Russian language. The system uses patents, research papers and government contracts for facilitating the expertise process by providing the experts with relevant documents. Comparison of our system with a popular baseline shows promising results.

advanced industrial conference on telecommunications | 2015

Add-Remove-Confirm: Crowdsourcing synset cleansing

Dmitry Ustalov; Yuri Kiselev

Thesaurus is a crucial resource for many natural language processing and artificial intelligence problems, which require common sense reasoning. It is becoming highly topical to put special effort to ensure the high quality of synsets when a thesaurus is created collaboratively by non-expert annotators. This paper proposes Add-Remove-Confirm, a novel workflow for crowdsourcing synset cleansing. The present workflow has been empirically evaluated using a Russian thesaurus created through crowdsourcing showing that it does improve the synset quality as according to the expert assessment with high level of agreement.

International Conference on Analysis of Images, Social Networks and Texts | 2015

TagBag: Annotating a Foreign Language Lexical Resource with Pictures

Dmitry Ustalov

Such forms of art as photography or drawing may serve as a uniform language, which represents things that we can either see or imagine. Hence, it is reasonable to use such pictures in order to connect nouns of the natural languages by their meanings. In this paper a study of mapping noun images from an annotated collection to the word senses of a foreign language lexical resource through the usage of a bilingual dictionary has been conducted. In this study, the English-Russian dictionary by V.K. Mueller has been used to enhance the Yet Another RussNet synsets with Flickr photos.

artificial intelligence and natural language | 2015

Crowdsourcing synset relations with Genus-Species-Match

Dmitry Ustalov

Enabling a domain-specific lexical resource is useful for improving the performance of a natural language processing system. However, such resources may be represented in the form of glossaries-terms provided with their sense definitions. Despite the problem of integrating such domain-specific glossaries into more sophisticated general purpose resources like thesuari being highly topical, it is complicated by ambiguity of the individual terms. This paper presents Genus-Species-Match, a crowdsourcing workflow for matching noisy pairs of synsets representing hyponymic/hypernymic relations. The system demonstrates F1 score of 80% on an experiment conducted on an online labor marketplace using the EMERCOM glossary and the Yet Another RussNet sense inventory.

International Conference on Analysis of Images, Social Networks and Texts | 2017

Fighting with the Sparsity of Synonymy Dictionaries for Automatic Synset Induction

Dmitry Ustalov; Mikhail Chernoskutov; Chris Biemann; Alexander Panchenko

Graph-based synset induction methods, such as MaxMax and Watset, induce synsets by performing a global clustering of a synonymy graph. However, such methods are sensitive to the structure of the input synonymy graph: sparseness of the input dictionary can substantially reduce the quality of the extracted synsets. In this paper, we propose two different approaches designed to alleviate the incompleteness of the input dictionaries. The first one performs a pre-processing of the graph by adding missing edges, while the second one performs a post-processing by merging similar synset clusters. We evaluate these approaches on two datasets for the Russian language and discuss their impact on the performance of synset induction methods. Finally, we perform an extensive error analysis of each approach and discuss prominent alternative methods for coping with the problem of sparsity of the synonymy dictionaries.

2017 Siberian Symposium on Data Science and Engineering (SSDSE) | 2017

Synonymy graph connectivity in graph-based word sense induction

Mikhail Chernoskutov; Dmitry Ustalov

In this paper, we present an approach for synonymy graph augmentation. The approach is based on the equivalence property of the synonymy relation and implies the addition of the missing transitive edges between the potential synonyms in the input synonymy graph. We also conduct the preliminary evaluation of this approach on two datasets for the Russian language and show that it does increase the quality of the graph clustering comparing to the non-augmented input graph.

Explore More