James Philbin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where James Philbin is active.

Explore More

Publication

Featured researches published by James Philbin.

computer vision and pattern recognition | 2008

Lost in quantization: Improving particular object retrieval in large scale image databases

James Philbin; Ondrej Chum; Michael Isard; Josef Sivic; Andrew Zisserman

The state of the art in visual object retrieval from large databases is achieved by systems that are inspired by text retrieval. A key component of these approaches is that local regions of images are characterized using high-dimensional descriptors which are then mapped to ldquovisual wordsrdquo selected from a discrete vocabulary.This paper explores techniques to map each visual region to a weighted set of words, allowing the inclusion of features which were lost in the quantization stage of previous systems. The set of visual words is obtained by selecting words based on proximity in descriptor space. We describe how this representation may be incorporated into a standard tf-idf architecture, and how spatial verification is modified in the case of this soft-assignment. We evaluate our method on the standard Oxford Buildings dataset, and introduce a new dataset for evaluation. Our results exceed the current state of the art retrieval performance on these datasets, particularly on queries with poor initial recall where techniques like query expansion suffer. Overall we show that soft-assignment is always beneficial for retrieval with large vocabularies, at a cost of increased storage requirements for the index.

british machine vision conference | 2008

Near Duplicate Image Detection: min-Hash and tf-idf Weighting.

Ondrej Chum; James Philbin; Andrew Zisserman

This paper proposes two novel image similarity measures for fast indexing via locality sensitive hashing. The similarity measures are applied and evaluated in the context of near duplicate image detection. The proposed method uses a visual vocabulary of vector quantized local feature descriptors (SIFT) and for retrieval exploits enhanced min-Hash techniques. Standard min-Hash uses an approximate set intersection between document descriptors was used as a similarity measure. We propose an efficient way of exploiting more sophisticated similarity measures that have proven to be essential in image / particular object retrieval. The proposed similarity measures do not require extra computational effort compared to the original measure. We focus primarily on scalability to very large image and video databases, where fast query processing is necessary. The method requires only a small amount of data need be stored for each image. We demonstrate our method on the TrecVid 2006 data set which contains approximately 146K key frames, and also on challenging the University of Kentucky image retrieval database.

conference on image and video retrieval | 2007

Scalable near identical image and shot detection

Ondřej Chum; James Philbin; Michael Isard; Andrew Zisserman

This paper proposes and compares two novel schemes for near duplicate image and video-shot detection. The first approach is based on global hierarchical colour histograms, using Locality Sensitive Hashing for fast retrieval. The second approach uses local feature descriptors (SIFT) and for retrieval exploits techniques used in the information retrieval community to compute approximate set intersections between documents using a min-Hash algorithm. The requirements for near-duplicate images vary according to the application, and we address two types of near duplicate definition: (i) being perceptually identical (e.g. up to noise, discretization effects, small photometric distortions etc); and (ii) being images of the same 3D scene (so allowing for viewpoint changes and partial occlusion). We define two shots to be near-duplicates if they share a large percentage of near-duplicate frames. We focus primarily on scalability to very large image and video databases, where fast query processing is necessary. Both methods are designed so that only a small amount of data need be stored for each image. In the case of near-duplicate shot detection it is shown that a weak approximation to histogram matching, consuming substantially less storage, is sufficient for good results. We demonstrate our methods on the TRECVID 2006 data set which contains approximately 165 hours of video (about 17.8M frames with 146K key frames), and also on feature films and pop videos.

indian conference on computer vision, graphics and image processing | 2008

Object Mining Using a Matching Graph on Very Large Image Collections

James Philbin; Andrew Zisserman

Automatic organization of large, unordered image collections is an extremely challenging problem with many potential applications. Often, what is required is that images taken in the same place, of the same thing, or of the same person be conceptually grouped together. This work focuses on grouping images containing the same object, despite significant changes in scale, viewpoint and partial occlusions, in very large (1M+) image collections automatically gathered from Flicker. The scale of the data and the extreme variation in imaging conditions makes the problem very challenging. We describe a scalable method that first computes a matching graph over all the images. Image groups can then be mined from this graph using standard clustering techniques. The novelty we bring is that both the matching graph and the clustering methods are able to use the spatial consistency between the images arising from the common object (if there is one). We demonstrate our methods on a publicly available dataset of 5 K images of Oxford, a 37 K image dataset containing images of the Statue of Liberty, and a much larger 1M image dataset of Rome. This is, to our knowledge, the largest dataset to which image-based data mining has been applied.

european conference on computer vision | 2016

PlaNet - Photo Geolocation with Convolutional Neural Networks

Tobias Weyand; Ilya Kostrikov; James Philbin

Is it possible to determine the location of a photo from just its pixels? While the general problem seems exceptionally difficult, photos often contain cues such as landmarks, weather patterns, vegetation, road markings, or architectural details, which in combination allow to infer where the photo was taken. Previously, this problem has been approached using image retrieval methods. In contrast, we pose the problem as one of classification by subdividing the surface of the earth into thousands of multi-scale geographic cells, and train a deep network using millions of geotagged images. We show that the resulting model, called PlaNet, outperforms previous approaches and even attains superhuman accuracy in some cases. Moreover, we extend our model to photo albums by combining it with a long short-term memory (LSTM) architecture. By learning to exploit temporal coherence to geolocate uncertain photos, this model achieves a 50 % performance improvement over the single-image model.

International Journal of Computer Vision | 2011

Geometric Latent Dirichlet Allocation on a Matching Graph for Large-scale Image Datasets

James Philbin; Josef Sivic; Andrew Zisserman

Given a large-scale collection of images our aim is to efficiently associate images which contain the same entity, for example a building or object, and to discover the significant entities. To achieve this, we introduce the Geometric Latent Dirichlet Allocation (gLDA) model for unsupervised discovery of particular objects in unordered image collections. This explicitly represents images as mixtures of particular objects or facades, and builds rich latent topic models which incorporate the identity and locations of visual words specific to the topic in a geometrically consistent way. Applying standard inference techniques to this model enables images likely to contain the same object to be probabilistically grouped and ranked.Additionally, to reduce the computational cost of applying the gLDA model to large datasets, we propose a scalable method that first computes a matching graph over all the images in a dataset. This matching graph connects images that contain the same object, and rough image groups can be mined from this graph using standard clustering techniques. The gLDA model can then be applied to generate a more nuanced representation of the data. We also discuss how “hub images” (images representative of an object or landmark) can easily be extracted from our matching graph representation.We evaluate our techniques on the publicly available Oxford buildings dataset (5K images) and show examples of automatically mined objects. The methods are evaluated quantitatively on this dataset using a ground truth labeling for a number of Oxford landmarks. To demonstrate the scalability of the matching graph method, we show qualitative results on two larger datasets of images taken of the Statue of Liberty (37K images) and Rome (1M+ images).

international conference on computer vision | 2009

Efficient retrieval of deformable shape classes using local self-similarities

Ken Chatfield; James Philbin; Andrew Zisserman

We present an efficient object retrieval system based on the identification of abstract deformable ‘shape’ classes using the self-similarity descriptor of Shechtman and Irani [13]. Given a user-specified query object, we retrieve other images which share a common ‘shape’ even if their appearance differs greatly in terms of colour, texture, edges and other common photometric properties. In order to use the self-similarity descriptor for efficient retrieval we make three contributions: (i) we sparsify the descriptor points by locating discriminative regions within each image, thus reducing the computational expense of shape matching; (ii) we extend [13] to enable matching despite changes in scale; and (iii) we show that vector quantizing the descriptor does not inhibit performance, thus providing the basis of a large-scale shape-based retrieval system using a bag-of-visual-words approach. Performance is demonstrated on the challenging ETHZ deformable shape dataset and a full episode from the television series Lost, and is shown to be superior to appearancebased approaches for matching non-rigid shape classes.

british machine vision conference | 2008

Geometric LDA: A Generative Model for Particular Object Discovery

James Philbin; Josef Sivic; Andrew Zisserman

Automatically organizing collections of images presents serious challenges to the current state-of-the art methods in image data mining. Often, what is required is that images taken in the same place, of the same thing, or of the same person be conceptually grouped together. To achieve this, we introduce the Geometric Latent Dirichlet Allocation (gLDA) model for unsupervised particular object discovery in unordered image collections. This explicitly represents documents as mixtures of particular objects or facades, and builds rich latent topic models which incorporate the identity and locations of visual words specific to the topic in a geometrically consistent way. Applying standard inference techniques to this model enables images likely to contain the same object to be probabilistically grouped and ranked. We demonstrate the model on a publicly available dataset of Oxford images, and show examples of spatially consistent groupings.

conference on image and video retrieval | 2008

University of Oxford video retrieval system

James Philbin; Andrew Zisserman

Fast multimedia retrieval in large video datasets remains an extremely challenging problem. This system explores how novel techniques from Computer Vision can be used to search in cases where the subtitle text is uninformative (e.g. actions, particular objects).

computer vision and pattern recognition | 2007