Desmond Elliott
University of Glasgow
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Desmond Elliott.
meeting of the association for computational linguistics | 2014
Desmond Elliott; Frank Keller
Image description is a new natural language generation task, where the aim is to generate a human-like description of an image. The evaluation of computer-generated text is a notoriously difficult problem, however, the quality of image descriptions has typically been measured using unigram BLEU and human judgements. The focus of this paper is to determine the correlation of automatic measures with human judgements for this task. We estimate the correlation of unigram and Smoothed BLEU, TER, ROUGE-SU4, and Meteor against human judgements on two data sets. The main finding is that unigram BLEU has a weak correlation, and Meteor has the strongest correlation with human judgements.
Journal of Artificial Intelligence Research | 2016
Raffaella Bernardi; Ruket Cakici; Desmond Elliott; Aykut Erdem; Erkut Erdem; Nazli Ikizler-Cinbis; Frank Keller; Adrian Muscat; Barbara Plank
Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities. In this survey, we classify the existing approaches based on how they conceptualize this problem, viz., models that cast description as either generation problem or as a retrieval problem over a visual or multimodal representational space. We provide a detailed review of existing models, highlighting their advantages and disadvantages. Moreover, we give an overview of the benchmark image datasets and the evaluation measures that have been developed to assess the quality of machine-generated image descriptions. Finally we extrapolate future directions in the area of automatic image description generation.
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers | 2016
Lucia Specia; Stella Frank; Khalil Sima'an; Desmond Elliott
This paper introduces and summarises the findings of a new shared task at the intersection of Natural Language Processing and Computer Vision: the generation of image descriptions in a target language, given an image and/or one or more descriptions in a different (source) language. This challenge was organised along with the Conference on Machine Translation (WMT16), and called for system submissions for two task variants: (i) a translation task, in which a source language image description needs to be translated to a target language, (optionally) with additional cues from the corresponding image, and (ii) a description generation task, in which a target language description needs to be generated for an image, (optionally) with additional cues from source language descriptions of the same image. In this first edition of the shared task, 16 systems were submitted for the translation task and seven for the image description task, from a total of 10 teams.
meeting of the association for computational linguistics | 2016
Desmond Elliott; Stella Frank; Khalil Sima'an; Lucia Specia
We introduce the Multi30K dataset to stimulate multilingual multimodal research. Recent advances in image description have been demonstrated on English-language datasets almost exclusively, but image description should not be limited to English. This dataset extends the Flickr30K dataset with i) German translations created by professional translators over a subset of the English descriptions, and ii) descriptions crowdsourced independently of the original English descriptions. We outline how the data can be used for multilingual image description and multimodal machine translation, but we anticipate the data will be useful for a broader range of tasks.
international joint conference on natural language processing | 2015
Desmond Elliott; Arjen P. de Vries
The Visual Dependency Representation (VDR) is an explicit model of the spatial relationships between objects in an image. In this paper we present an approach to training a VDR Parsing Model without the extensive human supervision used in previous work. Our approach is to find the objects mentioned in a given description using a state-of-the-art object detector, and to use successful detections to produce training data. The description of an unseen image is produced by first predicting its VDR over automatically detected objects, and then generating the text with a template-based generation model using the predicted VDR. The performance of our approach is comparable to a state-of-the-art multimodal deep neural network in images depicting actions.
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers | 2016
Iacer Calixto; Desmond Elliott; Stella Frank
We present a doubly-attentive multimodal machine translation model. Our model learns to attend to source language and spatial-preserving CONV5,4 visual features as separate attention mechanisms in a neural translation model. In image description translation experiments (Task 1), we find an improvement of 2.3 Meteor points compared to initialising the hidden state of the decoder with only the FC7 features and 2.9 Meteor points compared to a text-only neural machine translation baseline, confirming the useful nature of attending to the CONV5,4 features.
information interaction in context | 2010
Richard Glassey; Desmond Elliott; Tamara Polajnar; Leif Azzopardi
This paper presents an interaction-based information filtering system designed for the needs of children accessing multiple streams of information. This is an emerging problem due to the increased information access and engagement by children for their education and entertainment, and the explosion of stream-based information sources on most topics.n It has been shown that children have difficulties formulating text-based queries and using interfaces primarily designed for adults. The in-progress system presented in this paper attempts to address these difficulties by employing an interaction-based interface that simplifies the expression of information needs and adapts itself to user interests over time. To overcome issues of content moderation, the system aggregates multiple child-friendly information feeds and performs offline processing to facilitate topic filtering. A set of standing topics are created for initial interaction and subsequent interactions are used to infer and refine which topics the child would most likely want to have presented. A simple and easy-to-use interface is presented which uses relevance information to determine the appropriate size of the document title to display to act as a relevance-cue to the user.n The planned research focuses on validating the interaction-based approach with both child and adult populations to discover the differences and similarities that may exist.
conference on image and video retrieval | 2009
Thierry Urruty; Frank Hopfgartner; David Hannah; Desmond Elliott; Joemon M. Jose
In this paper, we present a novel video search interface based on the concept of aspect browsing. The proposed strategy is to assist the user in exploratory video search by actively suggesting new query terms and video shots. Our approach has the potential to narrow the Semantic Gap issue by allowing users to explore the data collection. First, we describe a clustering technique to identify potential aspects of a search. Then, we use the results to propose suggestions to the user to help them in their search task. Finally, we analyse this approach by exploiting the log files and the feedbacks of a user study.
conference on information and knowledge management | 2009
Desmond Elliott; Joemon M. Jose
We present a personalised retrieval system that captures explicit relevance feedback to build an evolving user profile with multiple aspects. The user profile is used to proactively retrieve results between search sessions to support multi-session search tasks. This approach to supporting users with their multi-session search tasks is evaluated in a between-subjects multiple time-series study with ten subjects performing two simulated work situation tasks over five sessions. System interaction data shows that subjects using the personalised retrieval system issue fewer queries and interact with fewer results than subjects using a baseline system. The interaction data also shows a trend of subjects interacting with the proactively retrieved results in the personalised retrieval system.
meeting of the association for computational linguistics | 2016
Emiel van Miltenburg; Roser Morante; Desmond Elliott
We provide a qualitative analysis of the descriptions containing negations (no, not, nt, nobody, etc) in the Flickr30K corpus, and a categorization of negation uses. Based on this analysis, we provide a set of requirements that an image description system should have in order to generate negation sentences. As a pilot experiment, we used our categorization to manually annotate sentences containing negations in the Flickr30K corpus, with an agreement score of K=0.67. With this paper, we hope to open up a broader discussion of subjective language in image descriptions.