Omkar M. Parkhi
University of Oxford
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Omkar M. Parkhi.
british machine vision conference | 2015
Omkar M. Parkhi; Andrea Vedaldi; Andrew Zisserman
The goal of this paper is face recognition – from either a single photograph or from a set of faces tracked in a video. Recent progress in this area has been due to two factors: (i) end to end learning for the task using a convolutional neural network (CNN), and (ii) the availability of very large scale training datasets. We make two contributions: first, we show how a very large scale dataset (2.6M images, over 2.6K people) can be assembled by a combination of automation and human in the loop, and discuss the trade off between data purity and time; second, we traverse through the complexities of deep network training and face recognition to present methods and procedures to achieve comparable state of the art results on the standard LFW and YTF face benchmarks.
british machine vision conference | 2013
Karen Simonyan; Omkar M. Parkhi; Andrea Vedaldi; Andrew Zisserman
Several recent papers on automatic face verification have significantly raised the performance bar by developing novel, specialised representations that outperform standard features such as SIFT for this problem. This paper makes two contributions: first, and somewhat surprisingly, we show that Fisher vectors on densely sampled SIFT features, i.e. an off-the-shelf object recognition representation, are capable of achieving state-of-the-art face verification performance on the challenging “Labeled Faces in the Wild” benchmark; second, since Fisher vectors are very high dimensional, we show that a compact descriptor can be learnt from them using discriminative metric learning. This compact descriptor has a better recognition accuracy and is very well suited to large scale identification tasks.
computer vision and pattern recognition | 2012
Omkar M. Parkhi; Andrea Vedaldi; Andrew Zisserman; C. V. Jawahar
We investigate the fine grained object categorization problem of determining the breed of animal from an image. To this end we introduce a new annotated dataset of pets covering 37 different breeds of cats and dogs. The visual problem is very challenging as these animals, particularly cats, are very deformable and there can be quite subtle differences between the breeds. We make a number of contributions: first, we introduce a model to classify a pet breed automatically from an image. The model combines shape, captured by a deformable part model detecting the pet face, and appearance, captured by a bag-of-words model that describes the pet fur. Fitting the model involves automatically segmenting the animal in the image. Second, we compare two classification approaches: a hierarchical one, in which a pet is first assigned to the cat or dog family and then to a breed, and a flat one, in which the breed is obtained directly. We also investigate a number of animal and image orientated spatial layouts. These models are very good: they beat all previously published results on the challenging ASIRRA test (cat vs dog discrimination). When applied to the task of discriminating the 37 different breeds of pets, the models obtain an average accuracy of about 59%, a very encouraging result considering the difficulty of the problem.
international conference on computer vision | 2011
Omkar M. Parkhi; Andrea Vedaldi; C. V. Jawahar; Andrew Zisserman
Template-based object detectors such as the deformable parts model of Felzenszwalb et al. [11] achieve state-of-the-art performance for a variety of object categories, but are still outperformed by simpler bag-of-words models for highly flexible objects such as cats and dogs. In these cases we propose to use the template-based model to detect a distinctive part for the class, followed by detecting the rest of the object via segmentation on image specific information learnt from that part. This approach is motivated by two ob- servations: (i) many object classes contain distinctive parts that can be detected very reliably by template-based detec- tors, whilst the entire object cannot; (ii) many classes (e.g. animals) have fairly homogeneous coloring and texture that can be used to segment the object once a sample is provided in an image. We show quantitatively that our method substantially outperforms whole-body template-based detectors for these highly deformable object categories, and indeed achieves accuracy comparable to the state-of-the-art on the PASCAL VOC competition, which includes other models such as bag-of-words.
computer vision and pattern recognition | 2014
Omkar M. Parkhi; Karen Simonyan; Andrea Vedaldi; Andrew Zisserman
Our goal is to learn a compact, discriminative vector representation of a face track, suitable for the face recognition tasks of verification and classification. To this end, we propose a novel face track descriptor, based on the Fisher Vector representation, and demonstrate that it has a number of favourable properties. First, the descriptor is suitable for tracks of both frontal and profile faces, and is insensitive to their pose. Second, the descriptor is compact due to discriminative dimensionality reduction, and it can be further compressed using binarization. Third, the descriptor can be computed quickly (using hard quantization) and its compact size and fast computation render it very suitable for large scale visual repositories. Finally, the descriptor demonstrates good generalization when trained on one dataset and tested on another, reflecting its tolerance to the dataset bias. In the experiments we show that the descriptor exceeds the state of the art on both face verification task (YouTube Faces without outside training data, and INRIA-Buffy benchmarks), and face classification task (using the Oxford-Buffy dataset).
workshop on image analysis for multimedia interactive services | 2012
Omkar M. Parkhi; Andrea Vedaldi; Andrew Zisserman
We describe a method of visual search for finding people in large video datasets. The novelty is that the person of interest can be specified at run time by a text query, and a discriminative classifier for that person is then learnt on-the-fly using images downloaded from Google Image search. The performance of the method is evaluated on a ground truth dataset of episodes of Scrubs, and results are also shown for retrieval on the TRECVid 2011 IACC.1.B dataset of over 8k videos. The entire process from specifying the query to receiving the ranked results takes only a matter of seconds.
ieee international conference on automatic face gesture recognition | 2017
Nate Crosswhite; Jeffrey Byrne; Chris Stauffer; Omkar M. Parkhi; Qiong Cao; Andrew Zisserman
Face recognition performance evaluation has traditionally focused on one-to-one verification, popularized by the Labeled Faces in the Wild dataset [1] for imagery and the YouTubeFaces dataset [2] for videos. In contrast, the newly released IJB-A face recognition dataset [3] unifies evaluation of one-to-many face identification with one-to-one face verification over templates, or sets of imagery and videos for a subject. In this paper, we study the problem of template adaptation, a form of transfer learning to the set of media in a template. Extensive performance evaluations on IJB-A show a surprising result, that perhaps the simplest method of template adaptation, combining deep convolutional network features with template specific linear SVMs, outperforms the state-of-the-art by a wide margin. We study the effects of template size, negative set construction and classifier fusion on performance, then compare template adaptation to convolutional networks with metric learning, 2D and 3D alignment. Our unexpected conclusion is that these other methods, when combined with template adaptation, all achieve nearly the same top performance on IJB-A for templatebased face verification and identification.
indian conference on computer vision, graphics and image processing | 2014
Makarand Tapaswi; Omkar M. Parkhi; Esa Rahtu; Eric Sommerlade; Rainer Stiefelhagen; Andrew Zisserman
The goal of this paper is unsupervised face clustering in edited video material – where face tracks arising from different people are assigned to separate clusters, with one cluster for each person. In particular we explore the extent to which faces can be clustered automatically without making an error. This is a very challenging problem given the variation in pose, lighting and expressions that can occur, and the similarities between different people. The novelty we bring is three fold: first, we show that a form of weak supervision is available from the editing structure of the material – the shots, threads and scenes that are standard in edited video; second, we show that by first clustering within scenes the number of face tracks can be significantly reduced with almost no errors; third, we propose an extension of the clustering method to entire episodes using exemplar SVMs based on the negative training data automatically harvested from the editing structure. The method is demonstrated on multiple episodes from two very different TV series, Scrubs and Buffy. For both series it is shown that we move towards our goal, and also outperform a number of baselines from previous works.
british machine vision conference | 2015
Elliot J. Crowley; Omkar M. Parkhi; Andrew Zisserman
We study the problem of matching photos of a person to paintings of that person, in order to retrieve similar paintings given a query photo. This is challenging as paintings span many media (oil, ink, watercolor) and can vary tremendously in style (caricature, pop art, minimalist). We make the following contributions: (i) we show that, depending on the face representation used, performance can be improved substantially by learning – either by a linear projection matrix common across identities, or by a per-identity classifier. We compare Fisher Vector and Convolutional Neural Network representations for this task; (ii) we introduce new datasets for learning and evaluating this problem; (iii) we also consider the reverse problem of retrieving photos from a large corpus given a painting; and finally, (iv) using the learnt descriptors, we show that, given a photo of a person, we are able to find their doppelgänger in a large dataset of oil paintings, and how this result can be varied by modifying attributes (e.g. frowning, old looking).
international conference on multimedia retrieval | 2013
Kevin McGuinness; Noel E. O'Connor; Robin Aly; Franciska de Jong; Ken Chatfield; Omkar M. Parkhi; Relja Arandjelović; Andrew Zisserman; Matthijs Douze; Cordelia Schmid
We demonstrate a multimedia content information retrieval engine developed for audiovisual digital libraries targeted at media professionals. It is the first of three multimedia IR systems being developed by the AXES project. The system brings together traditional text IR and state-of-the-art content indexing and retrieval technologies to allow users to search and browse digital libraries in novel ways. Key features include: metadata and ASR search and filtering, on-the-fly visual concept classification (categories, faces, places, and logos), and similarity search (instances and faces).