Josiah Wang
University of Sheffield
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Josiah Wang.
british machine vision conference | 2009
Josiah Wang; Katja Markert; Mark Everingham
We investigate the task of learning models for visual object recognition from natural language descriptions alone. The approach contributes to the recognition of fine-grain object categories, such as animal and plant species, where it may be difficult to collect many images for training, but where textual descriptions of visual attributes are readily available. As an example we tackle recognition of butterfly species, learning models from descriptions in an online nature guide. We propose natural language processing methods for extracting salient visual attributes from these descriptions to use as ‘templates’ for the object categories, and apply vision methods to extract corresponding attributes from test images. A generative model is used to connect textual terms in the learnt templates to visual attributes. We report experiments comparing the performance of humans and the proposed method on a dataset of ten butterfly categories.
cross language evaluation forum | 2015
Mauricio Villegas; Henning Müller; Andrew Gilbert; Luca Piras; Josiah Wang; Krystian Mikolajczyk; Alba Garcia Seco de Herrera; Stefano Bromuri; M. Ashraful Amin; Mahmood Kazi Mohammed; Burak Acar; Suzan Uskudarli; Neda Barzegar Marvasti; José F. Aldana; María del Mar Roldán García
This paper presents an overview of the ImageCLEF 2015 evaluation campaign, an event that was organized as part of the CLEF labs 2015. ImageCLEF is an ongoing initiative that promotes the evaluation of technologies for annotation, indexing and retrieval for providing information access to databases of images in various usage scenarios and domains. In 2015, the 13th edition of ImageCLEF, four main tasks were proposed: 1 automatic concept annotation, localization and sentence description generation for general images; 2 identification, multi-label classification and separation of compound figures from biomedical literature; 3 clustering of x-rays from all over the body; and 4 prediction of missing radiological annotations in reports of liver CT images. The x-ray task was the only fully novel task this year, although the other three tasks introduced modifications to keep up relevancy of the proposed challenges. The participation was considerably positive in this edition of the lab, receiving almost twice the number of submitted working notes papers as compared to previous years.
computer vision and pattern recognition | 2016
Yuxing Tang; Josiah Wang; Boyang Gao; Emmanuel Dellandréa; Robert J. Gaizauskas; Liming Chen
Deep CNN-based object detection systems have achieved remarkable success on several large-scale object detection benchmarks. However, training such detectors requires a large number of labeled bounding boxes, which are more difficult to obtain than image-level annotations. Previous work addresses this issue by transforming image-level classifiers into object detectors. This is done by modeling the differences between the two on categories with both imagelevel and bounding box annotations, and transferring this information to convert classifiers to detectors for categories without bounding box annotations. We improve this previous work by incorporating knowledge about object similarities from visual and semantic domains during the transfer process. The intuition behind our proposed method is that visually and semantically similar categories should exhibit more common transferable properties than dissimilar categories, e.g. a better detector would result by transforming the differences between a dog classifier and a dog detector onto the cat class, than would by transforming from the violin class. Experimental results on the challenging ILSVRC2013 detection dataset demonstrate that each of our proposed object similarity based knowledge transfer methods outperforms the baseline methods. We found strong evidence that visual similarity and semantic relatedness are complementary for the task, and when combined notably improve detection, achieving state-of-the-art detection performance in a semi-supervised setting.
Proceedings of the First Conference on Machine Translation: Volume 2,#N# Shared Task Papers | 2016
Kashif Shah; Josiah Wang; Lucia Specia
This paper describes the University of Sheffield’s submission for the WMT16 Multimodal Machine Translation shared task, where we participated in Task 1 to develop German-to-English and Englishto-German statistical machine translation (SMT) systems in the domain of image descriptions. Our proposed systems are standard phrase-based SMT systems based on the Moses decoder, trained only on the provided data. We investigate how image features can be used to re-rank the n-best list produced by the SMT model, with the aim of improving performance by grounding the translations on images. Our submissions are able to outperform the strong, text-only baseline system for both directions
empirical methods in natural language processing | 2015
Arnau Ramisa; Josiah Wang; Ying Lu; Emmanuel Dellandréa; Francesc Moreno-Noguer; Robert J. Gaizauskas
This work was supported by La Ligue Nationale Contre le Cancer, ANR (Agence Nationale de la recherche, ANR-14-CE10-0010-02), AFM-Telethon and INCa (Institut National du Cancer, 2011-1-RT-01, 2011-1-PLBIO-09, 2013-1-PLBIO-14), the MINECO in Spain (projects MAT2012-31759 and MAT2015-71117-R), and the EU COST Action MP1202. Thanks are due to the French Embassy in Cuba and the Campus France for their contribution in partial financial support.Trabajo presentado en la 5th OLIVEBIOTEQ Conference (International Conference for OliveTrees and Olive Products), celebrada en Ammam (Jordania) del 3 al 6 de noviemnre de 2014.-- Mateille, T. et al..Resumen del poster presentado en el Congreso celebrado en Saint-Malo, Francia, entre el 22 y el 25 de marzo de 2018.We investigate the role that geometric, textual and visual features play in the task of predicting a preposition that links two visual entities depicted in an image. The task is an important part of the subsequent process of generating image descriptions. We explore the prediction of prepositions for a pair of entities, both in the case when the labels of such entities are known and unknown. In all situations we found clear evidence that all three features contribute to the prediction task.This research is a contribution to the projects CGL2012-39471 of the Spanish MINECO, RALI 197 of the French ANR, CAR/2014 (University of Cagliari, Italy) and IGCP 591 (IUGS-UNESCO).
natural language generation | 2015
Josiah Wang; Robert J. Gaizauskas
In this paper, we present the task of generating image descriptions with gold standard visual detections as input, rather than directly from an image. This allows the Natural Language Generation community to focus on the text generation process, rather than dealing with the noise and complications arising from the visual detection process. We propose a fine-grained evaluation metric specifically for evaluating the content selection capabilities of image description generation systems. To demonstrate the evaluation metric on the task, several baselines are presented using bounding box information and textual information as priors for content selection. The baselines are evaluated using the proposed metric, showing that the fine-grained metric is useful for evaluating the content selection phase of an image description generation system.
international conference on computational linguistics | 2014
Josiah Wang; Fei Yan; Ahmet Aker; Robert J. Gaizauskas
Different people may describe the same object in different ways, and at varied levels of granularity (“poodle”, “dog”, “pet” or “animal”?) In this paper, we propose the idea of ‘granularityaware’ groupings where semantically related concepts are grouped across different levels of granularity to capture the variation in how different people describe the same image content. The idea is demonstrated in the task of automatic image annotation, where these semantic groupings are used to alter the results of image annotation in a manner that affords different insights from its initial, category-independent rankings. The semantic groupings are also incorporated during evaluation against image descriptions written by humans. Our experiments show that semantic groupings result in image annotations that are more informative and flexible than without groupings, although being too flexible may result in image annotations that are less informative.
empirical methods in natural language processing | 2015
Robert J. Gaizauskas; Josiah Wang; Arnau Ramisa
In this paper, we introduce the notion of visually descriptive language (VDL) ‐ intuitively a text segment whose truth can be confirmed by visual sense alone. VDL can be exploited in many vision-based tasks, e.g. image interpretation and story illustration. In contrast to previous work requiring pre-aligned texts and images, we propose a broader definition of VDL that extends to a much larger range of texts without associated images. We also discuss possible VDL annotation tasks and make recommendations for difficult cases. Lastly, we demonstrate the viability of our definition via an annotation exercise across several text genres and analyse inter-annotator agreement. Results show reasonably high levels of agreement between annotators can be reached.
The Prague Bulletin of Mathematical Linguistics | 2017
Chiraag Lala; Pranava Swaroop Madhyastha; Josiah Wang; Lucia Specia
Abstract Recent work on multimodal machine translation has attempted to address the problem of producing target language image descriptions based on both the source language description and the corresponding image. However, existing work has not been conclusive on the contribution of visual information. This paper presents an in-depth study of the problem by examining the differences and complementarities of two related but distinct approaches to this task: textonly neural machine translation and image captioning. We analyse the scope for improvement and the effect of different data and settings to build models for these tasks. We also propose ways of combining these two approaches for improved translation quality.
international conference on natural language generation | 2016
Josiah Wang; Robert J. Gaizauskas
We tackle the sub-task of content selection as part of the broader challenge of automatically generating image descriptions. More specifically, we explore how decisions can be made to select what object instances should be mentioned in an image description, given an image and labelled bounding boxes. We propose casting the content selection problem as a learning to rank problem, where object instances that are most likely to be mentioned by humans when describing an image are ranked higher than those that are less likely to be mentioned. Several features are explored: those derived from bounding box localisations, from concept labels, and from image regions. Object instances are then selected based on the ranked list, where we investigate several methods for choosing a stopping criterion as the ‘cut-off’ point for objects in the ranked list. Our best-performing method achieves state-of-the-art performance on the ImageCLEF2015 sentence generation challenge.