Ben Daubney | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ben Daubney is active.

Explore More

Publication

Featured researches published by Ben Daubney.

Computer Graphics Forum | 2012

State of the Art Report on Video-Based Graphics and Video Visualization

Rita Borgo; Min Chen; Ben Daubney; Edward Grundy; Gunther Heidemann; Benjamin Höferlin; Markus Höferlin; Heike Leitte; Daniel Weiskopf; Xianghua Xie

In recent years, a collection of new techniques which deal with video as input data, emerged in computer graphics and visualization. In this survey, we report the state of the art in video‐based graphics and video visualization. We provide a review of techniques for making photo‐realistic or artistic computer‐generated imagery from videos, as well as methods for creating summary and/or abstract visual representations to reveal important features and events in videos. We provide a new taxonomy to categorize the concepts and techniques in this newly emerged body of knowledge. To support this review, we also give a concise overview of the major advances in automated video analysis, as some techniques in this field (e.g. feature extraction, detection, tracking and so on) have been featured in video‐based modelling and rendering pipelines for graphics and visualization.

computer vision and pattern recognition | 2011

Tracking 3D human pose with large root node uncertainty

Ben Daubney; Xianghua Xie

Representing articulated objects as a graphical model has gained much popularity in recent years, often the root node of the graph describes the global position and orientation of the object. In this work a method is presented to robustly track 3D human pose by permitting greater uncertainty to be modeled over the root node than existing techniques allow. Significantly, this is achieved without increasing the uncertainty of remaining parts of the model. The benefit is that a greater volume of the posterior can be supported making the approach less vulnerable to tracking failure. Given a hypothesis of the root node state a novel method is presented to estimate the posterior over the remaining parts of the body conditioned on this value. All probability distributions are approximated using a single Gaussian allowing inference to be carried out in closed form. A set of deterministically selected sample points are used that allow the posterior to be updated for each part requiring just seven image likelihood evaluations making it extremely efficient. Multiple root node states are supported and propagated using standard sampling techniques. We believe this to be the first work devoted to efficient tracking of human pose whilst modeling large uncertainty in the root node and demonstrate the presented method to be more robust to tracking failures than existing approaches.

eurographics | 2011

A Survey on Video-based Graphics and Video Visualization

Rita Borgo; Min Chen; Ben Daubney; Edward Grundy; Gunther Heidemann; Benjamin Höferlin; Markus Höferlin; Heike Jänicke; Daniel Weiskopf; Xianghua Xie

In recent years, a collection of new techniques, which deal with videos a s the input data, emerged in computer graphics and visualization. In this survey, we report the state of the art in video-based graphics and video visualization. We provide a comprehensive review of techniques for making photo- realistic or artistic computer-generated imagery from videos, as well as methods for creating summary and/or abs tract visual representations to reveal important features and events in videos. We propose a new taxonomy to ca tegorize the concepts and techniques in this newly-emerged body of knowledge. To support this review, we als o give a concise overview of the major advances in automated video analysis, as some techniques in this field (e.g ., feature extraction, detection, tracking and so on) have been featured in video-based modeling and rendering p ipelines for graphics and visualization.

computer vision and pattern recognition | 2008

Real-time pose estimation of articulated objects using low-level motion

Ben Daubney; David P. Gibson; Neill W. Campbell

We present a method that is capable of tracking and estimating pose of articulated objects in real-time. This is achieved by using a bottom-up approach to detect instances of the object in each frame, these detections are then linked together using a high-level a priori motion model. Unlike other approaches that rely on appearance, our method is entirely dependent on motion; initial low-level part detection is based on how a region moves as opposed to its appearance. This work is best described as pictorial structures using motion. A sparse cloud of points extracted using a standard feature tracker are used as observational data, this data contains noise that is not Gaussian in nature but systematic due to tracking errors. Using a probabilistic framework we are able to overcome both corrupt and missing data whilst still inferring new poses from a generative model. Our approach requires no manual initialisation and we show results for a number of complex scenes and different classes of articulated object, this demonstrates both the robustness and versatility of the presented technique.

Graphical Models \/graphical Models and Image Processing \/computer Vision, Graphics, and Image Processing | 2014

A bag of words approach to subject specific 3D human pose interaction classification with random decision forests

Jingjing Deng; Xianghua Xie; Ben Daubney

In this work, we investigate whether it is possible to distinguish conversational interactions from observing human motion alone, in particular subject specific gestures in 3D. We adopt Kinect sensors to obtain 3D displacement and velocity measurements, followed by wavelet decomposition to extract low level temporal features. These features are then generalized to form a visual vocabulary that can be further generalized to a set of topics from temporal distributions of visual vocabulary. A subject specific supervised learning approach based on Random Forests is used to classify the testing sequences to seven different conversational scenarios. These conversational scenarios concerned in this work have rather subtle differences among them. Unlike typical action or event recognition, each interaction in our case contain many instances of primitive motions and actions, many of which are shared among different conversation scenarios. That is the interactions we are concerned with are not micro or instant events, such as hugging and high-five, but rather interactions over a period of time that consists rather similar individual motions, micro actions and interactions. We believe this is among one of the first work that is devoted to subject specific conversational interaction classification using 3D pose features and to show this task is indeed possible.

Computer Vision and Image Understanding | 2012

Estimating pose of articulated objects using low-level motion

Ben Daubney; David P. Gibson; Neill W. Campbell

In this work a method is presented to track and estimate pose of articulated objects using the motion of a sparse set of moving features. This is achieved by using a bottom-up generative approach based on the Pictorial Structures representation [1]. However, unlike previous approaches that rely on appearance, our method is entirely dependent on motion. Initial low-level part detection is based on how a region moves as opposed to its appearance. This work is best described as Pictorial Structures using motion. A standard feature tracker is used to automatically extract a sparse set of features. These features typically contain many tracking errors, however, the presented approach is able to overcome both this and their sparsity. The proposed method is applied to two problems: 2D pose estimation of articulated objects walking side onto the camera and 3D pose estimation of humans walking and jogging at arbitrary orientations to the camera. In each domain quantitative results are reported that improve on state of the art. The motivation of this work is to illustrate the information present in low-level motion that can be exploited for the task of pose estimation.

articulated motion and deformable objects | 2010

Estimating 3D pose via stochastic search and expectation maximization

Ben Daubney; Xianghua Xie

In this paper an approach is described to estimate 3D pose using a part based stochastic method. A proposed representation of the human body is explored defined over joints that employs full conditional models learnt between connected joints. This representation is compared against a popular alternative defined over parts using approximated limb conditionals. It is shown that using full limb conditionals results in a model that is far more representative of the original training data. Furthermore, it is demonstrated that Expectation Maximization is suitable for estimating 3D pose and better convergence is achieved when using full limb conditionals. To demonstrate the efficacy of the proposed method it is applied to the domain of 3D pose estimation using a single monocular image. Quantitative results are provided using the HumanEva dataset which confirm that the proposed method outperforms that of the competing part based model. In this work just a single model is learnt to represent all actions contained in the dataset which is applied to all subjects viewed from differing angles.

international conference on computer vision | 2009

Monocular 3D human pose estimation using sparse motion features

Ben Daubney; David P. Gibson; Neill W. Campbell

In this paper we demonstrate that the motion of a sparse set of tracked features can be used to extract 3D pose from a single viewpoint. The purpose of this work is to illustrate the wealth of information present in the temporal dimension of a sequence of images that is currently not being exploited. Our approach is entirely dependent upon motion. We use low-level part detectors consisting of 3D motion models, these describe probabilistically how well the observed motion of a tracked feature fits each model. Given these initial detections a bottom-up approach is employed to find the most likely configuration of a person in each frame. Models used are learnt directly from motion capture data and no training is performed using descriptors derived from image sequences. The result is the presented approach can be applied to people moving at arbitrary and previously unseen orientations relative to the camera, making it particularly versatile and robust. We evaluate our approach for both walking and jogging on the HumanEva data set where we achieve an accuracy of 65.8±23.3mmand 69.4±20.2mm for each action respectively.

british machine vision conference | 2011

Entropy Driven Hierarchical Search for 3D Human Pose Estimation

Ben Daubney; Xianghua Xie

3D Human pose estimation from a single monocular image is an extremely difficult problem. Currently there are two main approaches to solving this problem, the first is to learn a direct mapping from image features to 3D pose [1], the second is to first extract 2D pose as an intermediate stage and then ‘lift’ this to a 3D pose [2]. The limitation with both of these approaches is that they are only applicable to poses that are similar to those represented in the original training set, e.g. walking. It is unlikely they will scale to extract arbitrary 3D poses. Contrary to this, in the domain of 2D pose estimation current state-of-the-art methods have been shown capable of detecting poses that are much more varied [3]. This has been achieved using generative models built around the Pictorial Structures representation that decomposes pose estimation into a search across individual parts [4]. In this paper we present a generative method to extract 3D pose from single images using a part based representation. The method is stochastic, though in contrast to methods used for 3D tracking (e.g. the particle filter), where the search space in each frame is tightly constrained by previous observations, in single image pose estimation the search space is much larger. To permit a search over this space a generative prior model is learnt from motion capture data. Stochastic samples are used to approximate this prior and to facilitate its update. In effect, the initial prior is iteratively deformed to the posterior distribution. The body is represented by a set of ten parts, each part has a fixed length and connected parts are forced to join at fixed locations. The conditional distribution between two connected parts is modeled by first learning a joint distribution using a GMM p(xi,x j∣θi j), where xi and x j is the state of the ith and jth part respectively and θi j is the set of model parameters. As each model is represented using a GMM the model parameters are defined as θi j = {λ k i j,μ i j,Σi j}k=1, where K is the number of components in the model and λ k i j,μ k i j,Σ k i j represent the kth component’s weight, mean and covariance respectively. For efficiency all covariances used to represent limb conditionals are diagonal and can be partitioned such that Σi j = diag(Λ k ii,Λ k j j) and likewise μ k i j = ( μk i ,μ k j ) . Given a value for x j (e.g. a sample) the conditional distribution p(xi∣x j,θ k i j) is first calculated from the joint distribution p(xi,x j∣θi j), following which a sample xi can be drawn from it. The conditional distribution, p(xi∣x j,θ k i j), is also a GMM with parameters {λ k i ,μk i ,Λii}k=1. The component weights are proportional to the marginal distribution λ k i ∝ p(x j∣θ k i j), which is calculated from the normal distribution p(x j∣θ k i j) = λ k i jN (x j; μj ,Λj j). Note this conditional model is different to typical approximations used, when the conditional model is approximated by p(xi j∣θi j), where xi j is the value of xi in the local frame of reference of x j [3]. A benefit of learning a full conditional model between neighboring parts is that different GMM components learnt in quaternion space correspond to different spatial locations in R3. This is illustrated in Fig. 1 where it can clearly be seen that this representation can clearly capture multiple modes.

advanced concepts for intelligent vision systems | 2013

Recognizing Conversational Interaction Based on 3D Human Pose

Jingjing Deng; Xianghua Xie; Ben Daubney; Hui Fang; Phil W. Grant

In this paper, we take a bag of visual words approach to investigate whether it is possible to distinguish conversational scenarios from observing human motion alone, in particular gestures in 3D. The conversational interactions concerned in this work have rather subtle differences among them. Unlike typical action or event recognition, each interaction in our case contain many instances of primitive motions and actions, many of which are shared among different conversation scenarios. Hence, extracting and learning temporal dynamics are essential. We adopt Kinect sensors to extract low level temporal features. These features are then generalized to form a visual vocabulary that can be further generalized to a set of topics from temporal distributions of visual vocabulary. A subject-specific supervised learning approach based on both generative and discriminative classifiers is employed to classify the testing sequences to seven different conversational scenarios. We believe this is among one of the first work that is devoted to conversational interaction classification using 3D pose features and to show this task is indeed possible.

Explore More