Karen Simonyan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Karen Simonyan is active.

Explore More

Publication

Featured researches published by Karen Simonyan.

british machine vision conference | 2014

Return of the Devil in the Details: Delving Deep into Convolutional Nets.

Ken Chatfield; Karen Simonyan; Andrea Vedaldi; Andrew Zisserman

The latest generation of Convolutional Neural Networks (CNN) have achieved impressive results in challenging benchmarks on image recognition and object detection, significantly raising the interest of the community in these methods. Nevertheless, it is still unclear how different CNN methods compare with each other and with previous state-of-the-art shallow representations such as the Bag-of-Visual-Words and the Improved Fisher Vector. This paper conducts a rigorous evaluation of these new techniques, exploring different deep architectures and comparing them on a common ground, identifying and disclosing important implementation details. We identify several useful properties of CNN-based representations, including the fact that the dimensionality of the CNN output layer can be reduced significantly without having an adverse effect on performance. We also identify aspects of deep and shallow methods that can be successfully shared. In particular, we show that the data augmentation techniques commonly applied to CNN-based methods can also be applied to shallow methods, and result in an analogous performance boost. Source code and models to reproduce the experiments in the paper is made publicly available.

british machine vision conference | 2013

Fisher Vector Faces in the Wild.

Karen Simonyan; Omkar M. Parkhi; Andrea Vedaldi; Andrew Zisserman

Several recent papers on automatic face verification have significantly raised the performance bar by developing novel, specialised representations that outperform standard features such as SIFT for this problem. This paper makes two contributions: first, and somewhat surprisingly, we show that Fisher vectors on densely sampled SIFT features, i.e. an off-the-shelf object recognition representation, are capable of achieving state-of-the-art face verification performance on the challenging “Labeled Faces in the Wild” benchmark; second, since Fisher vectors are very high dimensional, we show that a compact descriptor can be learnt from them using discriminative metric learning. This compact descriptor has a better recognition accuracy and is very well suited to large scale identification tasks.

International Journal of Computer Vision | 2016

Reading Text in the Wild with Convolutional Neural Networks

Max Jaderberg; Karen Simonyan; Andrea Vedaldi; Andrew Zisserman

In this work we present an end-to-end system for text spotting—localising and recognising text in natural scene images—and text based image retrieval. This system is based on a region proposal mechanism for detection and deep convolutional neural networks for recognition. Our pipeline uses a novel combination of complementary proposal generation techniques to ensure high recall, and a fast subsequent filtering stage for improving precision. For the recognition and ranking of proposals, we train very large convolutional neural networks to perform word recognition on the whole proposal region at the same time, departing from the character classifier based systems of the past. These networks are trained solely on data produced by a synthetic text generation engine, requiring no human labelled data. Analysing the stages of our pipeline, we show state-of-the-art performance throughout. We perform rigorous experiments across a number of standard end-to-end text spotting benchmarks and text-based image retrieval datasets, showing a large improvement over all previous methods. Finally, we demonstrate a real-world application of our text spotting system to allow thousands of hours of news footage to be instantly searchable via a text query.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2014

Learning Local Feature Descriptors Using Convex Optimisation

Karen Simonyan; Andrea Vedaldi; Andrew Zisserman

The objective of this work is to learn descriptors suitable for the sparse feature detectors used in viewpoint invariant matching. We make a number of novel contributions towards this goal. First, it is shown that learning the pooling regions for the descriptor can be formulated as a convex optimisation problem selecting the regions using sparsity. Second, it is shown that descriptor dimensionality reduction can also be formulated as a convex optimisation problem, using Mahalanobis matrix nuclear norm regularisation. Both formulations are based on discriminative large margin learning constraints. As the third contribution, we evaluate the performance of the compressed descriptors, obtained from the learnt real-valued descriptors by binarisation. Finally, we propose an extension of our learning formulations to a weakly supervised case, which allows us to learn the descriptors from unannotated image collections. It is demonstrated that the new learning methods improve over the state of the art in descriptor learning on the annotated local patches data set of Brown et al. and unannotated photo collections of Philbin et al. .

computer vision and pattern recognition | 2014

A Compact and Discriminative Face Track Descriptor

Omkar M. Parkhi; Karen Simonyan; Andrea Vedaldi; Andrew Zisserman

Our goal is to learn a compact, discriminative vector representation of a face track, suitable for the face recognition tasks of verification and classification. To this end, we propose a novel face track descriptor, based on the Fisher Vector representation, and demonstrate that it has a number of favourable properties. First, the descriptor is suitable for tracks of both frontal and profile faces, and is insensitive to their pose. Second, the descriptor is compact due to discriminative dimensionality reduction, and it can be further compressed using binarization. Third, the descriptor can be computed quickly (using hard quantization) and its compact size and fast computation render it very suitable for large scale visual repositories. Finally, the descriptor demonstrates good generalization when trained on one dataset and tested on another, reflecting its tolerance to the dataset bias. In the experiments we show that the descriptor exceeds the state of the art on both face verification task (YouTube Faces without outside training data, and INRIA-Buffy benchmarks), and face classification task (using the Oxford-Buffy dataset).

asian conference on computer vision | 2014

Deep Convolutional Neural Networks for Efficient Pose Estimation in Gesture Videos

Tomas Pfister; Karen Simonyan; James Charles; Andrew Zisserman

Our objective is to efficiently and accurately estimate the upper body pose of humans in gesture videos. To this end, we build on the recent successful applications of deep convolutional neural networks (ConvNets). Our novelties are: (i) our method is the first to our knowledge to use ConvNets for estimating human pose in videos; (ii) a new network that exploits temporal information from multiple frames, leading to better performance; (iii) showing that pre-segmenting the foreground of the video improves performance; and (iv) demonstrating that even without foreground segmentations, the network learns to abstract away from the background and can estimate the pose even in the presence of a complex, varying background.

european conference on computer vision | 2012

Descriptor learning using convex optimisation

Karen Simonyan; Andrea Vedaldi; Andrew Zisserman

The objective of this work is to learn descriptors suitable for the sparse feature detectors used in viewpoint invariant matching. We make a number of novel contributions towards this goal: first, it is shown that learning the pooling regions for the descriptor can be formulated as a convex optimisation problem selecting the regions using sparsity; second, it is shown that dimensionality reduction can also be formulated as a convex optimisation problem, using the nuclear norm to reduce dimensionality. Both of these problems use large margin discriminative learning methods. The third contribution is a new method of obtaining the positive and negative training data in a weakly supervised manner. And, finally, we employ a state-of-the-art stochastic optimizer that is efficient and well matched to the non-smooth cost functions proposed here. It is demonstrated that the new learning methods improve over the state of the art in descriptor learning for large scale matching, Brown et al. [2], and large scale object retrieval, Philbin et al. [10].

computer vision and pattern recognition | 2014

Understanding Objects in Detail with Fine-Grained Attributes

Andrea Vedaldi; Siddharth Mahendran; Stavros Tsogkas; Subhransu Maji; Ross B. Girshick; Juho Kannala; Esa Rahtu; Iasonas Kokkinos; Matthew B. Blaschko; David Weiss; Ben Taskar; Karen Simonyan; Naomi Saphra; Sammy Mohamed

We study the problem of understanding objects in detail, intended as recognizing a wide array of fine-grained object attributes. To this end, we introduce a dataset of 7, 413 airplanes annotated in detail with parts and their attributes, leveraging images donated by airplane spotters and crowd-sourcing both the design and collection of the detailed annotations. We provide a number of insights that should help researchers interested in designing fine-grained datasets for other basic level categories. We show that the collected data can be used to study the relation between part detection and attribute prediction by diagnosing the performance of classifiers that pool information from different parts of an object. We note that the prediction of certain attributes can benefit substantially from accurate part detection. We also show that, differently from previous results in object detection, employing a large number of part templates can improve detection accuracy at the expenses of detection speed. We finally propose a coarse-to-fine approach to speed up detection through a hierarchical cascade algorithm.

medical image computing and computer assisted intervention | 2011

Immediate structured visual search for medical images

Karen Simonyan; Andrew Zisserman; Antonio Criminisi

The objective of this work is a scalable, real-time visual search engine for medical images. In contrast to existing systems that retrieve images that are globally similar to a query image, we enable the user to select a query Region Of Interest (ROI) and automatically detect the corresponding regions within all returned images. This allows the returned images to be ranked on the content of the ROI, rather than the entire image. Our contribution is two-fold: (i) immediate retrieval - the data is appropriately pre-processed so that the search engine returns results in real-time for any query image and ROI; (ii) structured output - returning ROIs with a choice of ranking functions. The retrieval performance is assessed on a number of annotated queries for images from the IRMA X-ray dataset and compared to a baseline.

asian conference on computer vision | 2014

Efficient On-the-fly Category Retrieval Using ConvNets and GPUs

Ken Chatfield; Karen Simonyan; Andrew Zisserman

We investigate the gains in precision and speed, that can be obtained by using Convolutional Networks (ConvNets) for on-the-fly retrieval – where classifiers are learnt at run time for a textual query from downloaded images, and used to rank large image or video datasets.

Explore More