Ruslan Mitkov
University of Wolverhampton
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ruslan Mitkov.
meeting of the association for computational linguistics | 1998
Ruslan Mitkov
Most traditional approaches to anaphora resolution rely heavily on linguistic and domain knowledge. One of the disadvantages of developing a knowledge-based system, however, is that it is a very labour-intensive and time-consuming task. This paper presents a robust, knowledge-poor approach to resolving pronouns in technical manuals, which operates on texts pre-processed by a part-of-speech tagger. Input is checked against agreement and for a number of antecedent indicators. Candidates are assigned scores by each indicator and the candidate with the highest score is returned as the antecedent. Evaluation reports a success rate of 89.7% which is better than the success rates of the approaches selected for comparison and tested on the same data. In addition, preliminary experiments show that the approach can be successfully adapted for other languages with minimum modifications.
Natural Language Engineering | 2006
Ruslan Mitkov; Le An Ha; Nikiforos Karamanis
This paper describes a novel computer-aided procedure for generating multiple-choice test items from electronic documents. In addition to employing various Natural Language Processing techniques, including shallow parsing, automatic term extraction, sentence transformation and computing of semantic distance, the system makes use of language resources such as corpora and ontologies. It identifies important concepts in the text and generates questions about these concepts as well as multiple-choice distractors, offering the user the option to post-edit the test items by means of a user-friendly interface. In assisting test developers to produce items in a fast and expedient manner without compromising quality, the tool saves both time and production costs.
international conference natural language processing | 2003
Ruslan Mitkov
Summary form only given. The paper describes a novel automatic procedure for the generation of multiple-choice tests from electronic documents. In addition to employing various NLP techniques including term extraction and shallow parsing, the system makes use of language resources such as corpora and ontologies. The system operates in a fully automatic mode and also a semiautomatic environment where the user is offered the option to post-edit the generated test items. The results from the conducted evaluation suggest that the new procedure is very effective saving time and labour considerably and that the test items produced with the help of the program are not of inferior quality to those produced manually.
international conference on computational linguistics | 2002
Ruslan Mitkov; Richard Evans; Constantin Orasan
This paper describes a new, advanced and completely revamped version of Mitkovs knowledge-poor approach to pronoun resolution [21]. In contrast to most anaphora resolution approaches, the new system, referred to as MARS, operates in fully automatic mode. It benefits from purpose-built programs for identifying occurrences of nonnominal anaphora (including pleonastic pronouns) and for recognition of animacy, and employs genetic algorithms to achieve optimal performance. The paper features extensive evaluation and discusses important evaluation issues in anaphora resolution.
ANARESOLUTION '97 Proceedings of a Workshop on Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts | 1997
Ruslan Mitkov
The paper discusses the significance of factors in anaphora resolution and on the basis of a comparative study argues that what matters is not only a good set of reliable factors but also the strategy for their application. The objective of the study was to find out how well the same set of factors worked within two different computational strategies. To this end, we tuned two anaphora resolution approaches to use the same core set of factors. The first approach uses constraints to discount implausible candidates and then consults preferences to rank order the most likely candidate. The second employs only preferences and does not discard any candidate but assumes initially that the candidate examined is the antecedent; on the basis of uncertainty reasoning formula this hypothesis is either rejected or accepted. The last section of the paper addresses some related unresolved issues which need further research.
conference on intelligent text processing and computational linguistics | 2001
Ruslan Mitkov
This paper argues that even though there has been considerable ad- vance in the research in anaphora resolution over the last 10 years, there are still a number of outstanding issues. The paper discusses several of these issues and outlines some of the work underway to address them with particular reference to the work carried out by the author’s research team.
Proceedings of the Workshop on Geometrical Models of Natural Language Semantics | 2009
Ruslan Mitkov; Le An Ha; Andrea Varga; Luz Rello
Mitkov and Ha (2003) and Mitkov et al. (2006) offered an alternative to the lengthy and demanding activity of developing multiple-choice test items by proposing an NLP-based methodology for construction of test items from instructive texts such as textbook chapters and encyclopaedia entries. One of the interesting research questions which emerged during these projects was how better quality distractors could automatically be chosen. This paper reports the results of a study seeking to establish which similarity measures generate better quality distractors of multiple-choice tests. Similarity measures employed in the procedure of selection of distractors are collocation patterns, four different methods of WordNet-based semantic similarity (extended gloss overlap measure, Leacock and Chodorows, Jiang and Conraths as well as Lins measures), distributional similarity, phonetic similarity as well as a mixed strategy combining the aforementioned measures. The evaluation results show that the methods based on Lins measure and on the mixed strategy outperform the rest, albeit not in a statistically significant fashion.
meeting of the association for computational linguistics | 2001
Catalina Barbu; Ruslan Mitkov
In this paper we argue that comparative evaluation in anaphora resolution has to be performed using the same pre-processing tools and on the same set of data. The paper proposes an evaluation environment for comparing anaphora resolution algorithms which is illustrated by presenting the results of the comparative evaluation of three methods on the basis of several evaluation measures.
international conference on computational linguistics | 2010
Iustina Ilisei; Diana Inkpen; Gloria Corpas Pastor; Ruslan Mitkov
This paper presents a machine learning approach to the study of translationese. The goal is to train a computer system to distinguish between translated and non-translated text, in order to determine the characteristic features that influence the classifiers. Several algorithms reach up to 97.62% success rate on a technical dataset. Moreover, the SVM classifier consistently reports a statistically significant improved accuracy when the learning system benefits from the addition of simplification features to the basic translational classifier system. Therefore, these findings may be considered an argument for the existence of the Simplification Universal.
Machine Translation | 2006
Viktor Pekar; Ruslan Mitkov; Dimitar Blagoev; Andrea Mulloni
Statistical methods to extract translational equivalents from non-parallel corpora hold the promise of ensuring the required coverage and domain customisation of lexicons as well as accelerating their compilation and maintenance. A challenge for these methods are rare, less common words and expressions, which often have low corpus frequencies. However, it is rare words such as newly introduced terminology and named entities that present the main interest for practical lexical acquisition. In this article, we study possibilities of improving the extraction of low-frequency equivalents from bilingual comparable corpora. Our work is carried out in the general framework which discovers equivalences between words of different languages using similarities between their occurrence patterns found in respective monolingual corpora. We develop a method that aims to compensate for insufficient amounts of corpus evidence on rare words: prior to measuring cross-language similarities, the method uses same-language corpus data to model co-occurrence vectors of rare words by predicting their unseen co-occurrences and smoothing rare, unreliable ones. Our experimental evaluation demonstrates that the proposed method delivers a consistent and significant improvement on the conventional approach to this task.