Mark Everingham | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mark Everingham is active.

Explore More

Publication

Featured researches published by Mark Everingham.

International Journal of Computer Vision | 2010

The Pascal Visual Object Classes (VOC) Challenge

Mark Everingham; Luc Van Gool; Christopher K. I. Williams; John Winn; Andrew Zisserman

The Pascal Visual Object Classes (VOC) challenge is a benchmark in visual object category recognition and detection, providing the vision and machine learning communities with a standard dataset of images and annotation, and standard evaluation procedures. Organised annually from 2005 to present, the challenge and its associated dataset has become accepted as the benchmark for object detection.This paper describes the dataset and evaluation procedure. We review the state-of-the-art in evaluated methods for both classification and detection, analyse whether the methods are statistically different, what they are learning from the images (e.g. the object or its context), and what the methods find easy or confuse. The paper concludes with lessons learnt in the three year history of the challenge, and proposes directions for future improvement and extension.

british machine vision conference | 2006

Hello! My name is... Buffy - Automatic Naming of Characters in TV Video

Mark Everingham; Josef Sivic; Andrew Zisserman

We investigate the problem of automatically labelling appearances of characters in TV or film material. This is tremendously challenging due to the huge variation in imaged appearance of each character and the weakness and ambiguity of available annotation. However, we demonstrate that high precision can be achieved by combining multiple sources of information, both visual and textual. The principal novelties that we introduce are: (i) automatic generation of time stamped character annotation by aligning subtitles and transcripts; (ii) strengthening the supervisory information by identifying when characters are speaking; (iii) using complementary cues of face matching and clothing matching to propose common annotations for face tracks. Results are presented on episodes of the TV series “Buffy the Vampire Slayer”.

international conference on machine learning | 2005

The 2005 PASCAL visual object classes challenge

Mark Everingham; Andrew Zisserman; Christopher K. I. Williams; Luc Van Gool; Moray Allan; Christopher M. Bishop; Olivier Chapelle; Navneet Dalal; Thomas Deselaers; Gyuri Dorkó; Stefan Duffner; Jan Eichhorn; Jason Farquhar; Mario Fritz; Christophe Garcia; Thomas L. Griffiths; Frédéric Jurie; Daniel Keysers; Markus Koskela; Jorma Laaksonen; Diane Larlus; Bastian Leibe; Hongying Meng; Hermann Ney; Bernt Schiele; Cordelia Schmid; Edgar Seemann; John Shawe-Taylor; Amos J. Storkey; Sandor Szedmak

The PASCAL Visual Object Classes Challenge ran from February to March 2005. The goal of the challenge was to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). Four object classes were selected: motorbikes, bicycles, cars and people. Twelve teams entered the challenge. In this chapter we provide details of the datasets, algorithms used by the teams, evaluation criteria, and results achieved.

british machine vision conference | 2010

Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation

Sam Johnson; Mark Everingham

We investigate the task of 2D articulated human pose estimation in unconstrained still images. This is extremely challenging because of variation in pose, anatomy, clothing, and imaging conditions. Current methods use simple models of body part appearance and plausible configurations due to limitations of available training data and constraints on computational expense. We show that such models severely limit accuracy. Building on the successful pictorial structure model (PSM) we propose richer models of both appearance and pose, using state-of-the-art discriminative classifiers without introducing unacceptable computational expense. We introduce a new annotated database of challenging consumer images, an order of magnitude larger than currently available datasets, and demonstrate over 50% relative improvement in pose estimation accuracy over a stateof-the-art method.

conference on image and video retrieval | 2005

Person spotting: video shot retrieval for face sets

Josef Sivic; Mark Everingham; Andrew Zisserman

Matching people based on their imaged face is hard because of the well known problems of illumination, pose, size and expression variation. Indeed these variations can exceed those due to identity. Fortunately, videos of people have the happy benefit of containing multiple exemplars of each person in a form that can easily be associated automatically using straightforward visual tracking. We describe progress in harnessing these multiple exemplars in order to retrieve humans automatically in videos, given a query face in a shot. There are three areas of interest: (i) the matching of sets of exemplars provided by “tubes” of the spatial-temporal volume; (ii) the description of the face using a spatial orientation field; and, (iii) the structuring of the problem so that retrieval is immediate at run time. The result is a person retrieval system, able to retrieve a ranked list of shots containing a particular person in the manner of Google. The method has been implemented and tested on two feature length movies.

computer vision and pattern recognition | 2011

Learning effective human pose estimation from inaccurate annotation

Sam Johnson; Mark Everingham

The task of 2-D articulated human pose estimation in natural images is extremely challenging due to the high level of variation in human appearance. These variations arise from different clothing, anatomy, imaging conditions and the large number of poses it is possible for a human body to take. Recent work has shown state-of-the-art results by partitioning the pose space and using strong nonlinear classifiers such that the pose dependence and multi-modal nature of body part appearance can be captured. We propose to extend these methods to handle much larger quantities of training data, an order of magnitude larger than current datasets, and show how to utilize Amazon Mechanical Turk and a latent annotation update scheme to achieve high quality annotations at low cost. We demonstrate a significant increase in pose estimation accuracy, while simultaneously reducing computational expense by a factor of 10, and contribute a dataset of 10,000 highly articulated poses.

Lecture Notes in Computer Science | 2006

Dataset Issues in Object Recognition

Jean Ponce; Tamara L. Berg; Mark Everingham; David A. Forsyth; Martial Hebert; Svetlana Lazebnik; Marcin Marszalek; Cordelia Schmid; Bryan C. Russell; Antonio Torralba; Christopher K. I. Williams; Jianguo Zhang; Andrew Zisserman

Appropriate datasets are required at all stages of object recognition research, including learning visual models of object and scene categories, detecting and localizing instances of these models in images, and evaluating the performance of recognition algorithms. Current datasets are lacking in several respects, and this paper discusses some of the lessons learned from existing efforts, as well as innovative ways to obtain very large and diverse annotated datasets. It also suggests a few criteria for gathering future datasets.

british machine vision conference | 2009

Learning Models for Object Recognition from Natural Language Descriptions

Josiah Wang; Katja Markert; Mark Everingham

We investigate the task of learning models for visual object recognition from natural language descriptions alone. The approach contributes to the recognition of fine-grain object categories, such as animal and plant species, where it may be difficult to collect many images for training, but where textual descriptions of visual attributes are readily available. As an example we tackle recognition of butterfly species, learning models from descriptions in an online nature guide. We propose natural language processing methods for extracting salient visual attributes from these descriptions to use as ‘templates’ for the object categories, and apply vision methods to extract corresponding attributes from test images. A generative model is used to connect textual terms in the learnt templates to visual attributes. We report experiments comparing the performance of humans and the proposed method on a dataset of ten butterfly categories.

british machine vision conference | 2008

Long term arm and hand tracking for continuous sign language TV broadcasts

Patrick Buehler; Mark Everingham; Daniel P. Huttenlocher; Andrew Zisserman

The goal of this work is to detect hand and arm positions over continuous sign language video sequences of more than one hour in length. We cast the problem as inference in a generative model of the image. Under this model, limb detection is expensive due to the very large number of possible configurations each part can assume. We make the following contributions to reduce this cost: (i) using efficient sampling from a pictorial structure proposal distribution to obtain reasonable configurations; (ii) identifying a large set of frames where correct configurations can be inferred, and using temporal tracking elsewhere. Results are reported for signing footage with changing background, challenging image conditions, and different signers; and we show that the method is able to identify the true arm and hand locations. The results exceed the state-of-the-art for the length and stability of continuous limb tracking.

computer vision and pattern recognition | 2009

Learning sign language by watching TV (using weakly aligned subtitles)

Patrick Buehler; Andrew Zisserman; Mark Everingham

The goal of this work is to automatically learn a large number of British sign language (BSL) signs from TV broadcasts. We achieve this by using the supervisory information available from subtitles broadcast simultaneously with the signing. This supervision is both weak and noisy: it is weak due to the correspondence problem since temporal distance between sign and subtitle is unknown and signing does not follow the text order; it is noisy because subtitles can be signed in different ways, and because the occurrence of a subtitle word does not imply the presence of the corresponding sign. The contributions are: (i) we propose a distance function to match signing sequences which includes the trajectory of both hands, the hand shape and orientation, and properly models the case of hands touching; (ii) we show that by optimizing a scoring function based on multiple instance learning, we are able to extract the sign of interest from hours of signing footage, despite the very weak and noisy supervision. The method is automatic given the English target word of the sign to be learnt. Results are presented for 210 words including nouns, verbs and adjectives.

Explore More