James Charles | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where James Charles is active.

Explore More

Publication

Featured researches published by James Charles.

asian conference on computer vision | 2014

Deep Convolutional Neural Networks for Efficient Pose Estimation in Gesture Videos

Tomas Pfister; Karen Simonyan; James Charles; Andrew Zisserman

Our objective is to efficiently and accurately estimate the upper body pose of humans in gesture videos. To this end, we build on the recent successful applications of deep convolutional neural networks (ConvNets). Our novelties are: (i) our method is the first to our knowledge to use ConvNets for estimating human pose in videos; (ii) a new network that exploits temporal information from multiple frames, leading to better performance; (iii) showing that pre-segmenting the foreground of the video improves performance; and (iv) demonstrating that even without foreground segmentations, the network learns to abstract away from the background and can estimate the pose even in the presence of a complex, varying background.

international conference on computer vision | 2011

Learning shape models for monocular human pose estimation from the Microsoft Xbox Kinect

James Charles; Mark Everingham

We propose a method for learning shape models enabling accurate articulated human pose estimation from a single image. Where previous work has typically employed simple geometric models of human limbs e.g. cylinders which lead to rectangular projections, we propose to learn a generative model of limb shape which can capture the wide variation in shape due to varying anatomy and pose. The model is learnt from silhouette, depth and 3D pose data provided by a Microsoft Xbox Kinect, such that no manual annotation is required. We employ the learnt model in a pictorial structure model framework and demonstrate improved pose estimation from single silhouettes compared to using conventional rectangular limb models.

european conference on computer vision | 2014

Domain-Adaptive Discriminative One-Shot Learning of Gestures

Tomas Pfister; James Charles; Andrew Zisserman

The objective of this paper is to recognize gestures in videos – both localizing the gesture and classifying it into one of multiple classes.

computer vision and pattern recognition | 2016

Personalizing Human Video Pose Estimation

James Charles; Tomas Pfister; Derek R. Magee; David C. Hogg; Andrew Zisserman

We propose a personalized ConvNet pose estimator that automatically adapts itself to the uniqueness of a persons appearance to improve pose estimation in long videos. We make the following contributions: (i) we show that given a few high-precision pose annotations, e.g. from a generic ConvNet pose estimator, additional annotations can be generated throughout the video using a combination of image-based matching for temporally distant frames, and dense optical flow for temporally local frames, (ii) we develop an occlusion aware self-evaluation model that is able to automatically select the high-quality and reject the erroneous additional annotations, and (iii) we demonstrate that these high-quality annotations can be used to fine-tune a ConvNet pose estimator and thereby personalize it to lock on to key discriminative features of the persons appearance. The outcome is a substantial improvement in the pose estimates for the target video using the personalized ConvNet compared to the original generic ConvNet. Our method outperforms the state of the art (including top ConvNet methods) by a large margin on three standard benchmarks, as well as on a new challenging YouTube video dataset. Furthermore, we show that training from the automatically generated annotations can be used to improve the performance of a generic ConvNet on other benchmarks.

british machine vision conference | 2012

Automatic and Efficient Long Term Arm and Hand Tracking for Continuous Sign Language TV Broadcasts

Tomas Pfister; James Charles; Mark Everingham; Andrew Zisserman

We present a fully automatic arm and hand tracker that detects joint positions over continuous sign language video sequences of more than an hour in length. Our framework replicates the state-of-the-art long term tracker by Buehler et al. (IJCV 2011), but does not require the manual annotation and, after automatic initialisation, performs tracking in real-time. We cast the problem as a generic frame-by-frame random forest regressor without a strong spatial model. Our contributions are (i) a co-segmentation algorithm that automatically separates the signer from any signed TV broadcast using a generative layered model; (ii) a method of predicting joint positions given only the segmentation and a colour model using a random forest regressor; and (iii) demonstrating that the random forest can be trained from an existing semi-automatic, but computationally expensive, tracker. The method is applied to signing footage with changing background, challenging imaging conditions, and for different signers. We achieve superior joint localisation results to those obtained using the method of Buehler et al.

british machine vision conference | 2013

Domain Adaptation for Upper Body Pose Tracking in Signed TV Broadcasts.

James Charles; Tomas Pfister; Derek R. Magee; David C. Hogg; Andrew Zisserman

The objective of this work is to estimate upper body pose for signers in TV broadcasts. Given suitable training data, the pose is estimated using a random forest body joint detector. However, obtaining such training data can be costly. The novelty of this paper is a method of transfer learning which is able to harness existing training data and use it for new domains. Our contributions are: (i) a method for adapting existing training data to generate new training data by synthesis for signers with different appearances, and (ii) a method for personalising training data. As a case study we show how the appearance of the arms for different clothing, specifically short and long sleeved clothes, can be modelled to obtain person-specific trackers. We demonstrate that the transfer learning and person specific trackers significantly improve pose estimation performance.

IEEE Vehicular Technology Magazine | 2011

ECOGEM: A European Framework-7 Project

Jianmin Jiang; James Charles; Konstantinos P. Demestichas

In this article, we describe a new European Framework 7-funded research project, EcoGem, and introduce a new concept of experience sharing and intelligent optimization of route planning via machine-learning approaches. EcoGem combines machine-learning techniques with communication technologies to produce an advanced driver assistance system (ADAS), which has a range of novel functionalities, including: 1) automatic generation of code-based traffic indication to allow other EcoGem-enabled fully electric vehicles (FEVs) to share the experience for every section of route (journey) traveled, 2) automatic learning from the past and online experience to intelligently optimize route planning and energy consumption, and 3) automatic and instant update of the route planning and optimisation process via ongoing experience shared by EcoGem-eneabled FEVs.

european conference on computer vision | 2016

Virtual Immortality: Reanimating Characters from TV Shows

James Charles; Derek R. Magee; David C. Hogg

The objective of this work is to build virtual talking avatars of characters fully automatically from TV shows. From this unconstrained data, we show how to capture a character’s style of speech, visual appearance and language in an effort to construct an interactive avatar of the person and effectively immortalize them in a computational model. We make three contributions (i) a complete framework for producing a generative model of the audiovisual and language of characters from TV shows; (ii) a novel method for aligning transcripts to video using the audio; and (iii) a fast audio segmentation system for silencing non-spoken audio from TV shows. Our framework is demonstrated using all 236 episodes from the TV series Friends (\(\approx \) 97 h of video) and shown to generate novel sentences as well as character specific speech and video.

british machine vision conference | 2013

Large-scale Learning of Sign Language by Watching TV

Tomas Pfister; James Charles; Andrew Zisserman

The goal of this work is to automatically learn a large number of signs from sign language-interpreted TV broadcasts. We achieve this by exploiting supervisory information available in the subtitles of the broadcasts. However, this information is both weak and noisy and this leads to a challenging correspondence problem when trying to identify the temporal window of the sign. We make the following contributions: (i) we show that, somewhat counter-intuitively, mouth patterns are highly informative for isolating words in a language for the Deaf, and their co-occurrence with signing can be used to significantly reduce the correspondence search space; and (ii) we develop a multiple instance learning method using an efficient discriminative search, which determines a candidate list for the sign with both high recall and precision. We demonstrate the method on videos from BBC TV broadcasts, and achieve higher accuracy and recall than previous methods, despite using much simpler features.

british machine vision conference | 2017

Real-time Factored ConvNets: Extracting the X Factor in Human Parsing.

James Charles; Ignas Budvytis; Roberto Cipolla

© 2017. The copyright of this document resides with its authors. We propose a real-time and lightweight multi-task style ConvNet (termed a Factored ConvNet) for human body parsing in images or video. Factored ConvNets have isolated areas which perform known sub-tasks, such as object localization or edge detection. We call this area and sub-task pair an X factor. Unlike multi-task ConvNets which have independent tasks, the Factored ConvNet’s sub-task has direct effect on the main task outcome. In this paper we show how to isolate the X factor of foreground/background (f/b) subtraction from the main task of segmenting human body images into 31 different body part types. Knowledge of this X factor leads to a number of benefits for the Factored ConvNet: 1) Ease of network transfer to other image domains, 2) ability to personalize to humans in video and 3) easy model performance boosts. All achieved by either efficient network update or replacement of the X factor whilst avoiding catastrophic forgetting of previously learnt body part dependencies and structure. We show these benefits on a large dataset of images and also on YouTube videos.

Explore More