Peter Carr
Disney Research
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Peter Carr.
acm multimedia | 2013
Peter Carr; Michael Mistry; Iain A. Matthews
We present a method to generate aesthetic video from a robotic camera by incorporating a virtual camera operating on a delay, and a hybrid controller which uses feedback from both the robotic and virtual cameras. Our strategy employs a robotic camera to follow a coarse region-of-interest identified by a realtime computer vision system, and then resamples the captured images to synthesize the video that would have been recorded along a smooth, aesthetic camera trajectory. The smooth motion trajectory is obtained by operating the virtual camera on a short delay so that perfect knowledge of immediate future events is known. Previous autonomous camera installations have employed either robotic cameras or stationary wide-angle cameras with subregion cropping. Robotic cameras track the subject using realtime sensor data, and regulate a smoothness-latency trade-off through control gains. Fixed cameras post-process the data and suffer significant reductions in image resolution when the subject moves freely over a large area. Our approach provides a solution for broadcasting events from locations where camera operators cannot easily access. We can also offer broadcasters additional actuated camera angles without the overhead of additional human operators. Experiments on our prototype system for college basketball illustrate how our approach better mimics human operators compared to traditional robotic control approaches, while avoiding the loss in resolution that occurs from fixed camera system.
european conference on computer vision | 2012
Peter Carr; Yaser Sheikh; Iain A. Matthews
Multiview object detection methods achieve robustness in adverse imaging conditions by exploiting projective consistency across views. In this paper, we present an algorithm that achieves performance comparable to multiview methods from a single camera by employing geometric primitives as proxies for the true 3D shape of objects, such as pedestrians or vehicles. Our key insight is that for a calibrated camera, geometric primitives produce predetermined location-specific patterns in occupancy maps. We use these to define spatially-varying kernel functions of projected shape. This leads to an analytical formation model of occupancy maps as the convolution of locations and projected shape kernels. We estimate object locations by deconvolving the occupancy map using an efficient template similarity scheme. The number of objects and their positions are determined using the mean shift algorithm. The approach is highly parallel because the occupancy probability of a particular geometric primitive at each ground location is an independent computation. The algorithm extends to multiple cameras without requiring significant bandwidth. We demonstrate comparable performance to multiview methods and show robust, realtime object detection on full resolution HD video in a variety of challenging imaging conditions.
international conference on data mining | 2014
Yisong Yue; Patrick Lucey; Peter Carr; Alina Bialkowski; Iain Matthews
We consider the problem of learning predictive models for in-game sports play prediction. Focusing on basketball, we develop models for anticipating near-future events given the current game state. We employ a latent factor modeling approach, which leads to a compact data representation that enables efficient prediction given raw spatiotemporal tracking data. We validate our approach using tracking data from the 2012-2013 NBA season, and show that our model can make accurate in-game predictions. We provide a detailed inspection of our learned factors, and show that our model is interpretable and corresponds to known intuitions of basketball game play.
international conference on data mining | 2014
Alina Bialkowski; Patrick Lucey; Peter Carr; Yisong Yue; Sridha Sridharan; Iain A. Matthews
To the trained-eye, experts can often identify a team based on their unique style of play due to their movement, passing and interactions. In this paper, we present a method which can accurately determine the identity of a team from spatiotemporal player tracking data. We do this by utilizing a formation descriptor which is found by minimizing the entropy of role-specific occupancy maps. We show how our approach is significantly better at identifying different teams compared to standard measures (i.e., Shots, passes etc.). We demonstrate the utility of our approach using an entire season of Prozone player tracking data from a top-tier professional soccer league.
european conference on computer vision | 2014
Robert T. Collins; Peter Carr
Although ‘tracking-by-detection’ is a popular approach when reliable object detectors are available, missed detections remain a difficult hurdle to overcome. We present a hybrid stochastic/deterministic optimization scheme that uses RJMCMC to perform stochastic search over the space of detection configurations, interleaved with deterministic computation of the optimal multi-frame data association for each proposed detection hypothesis. Since object trajectories do not need to be estimated directly by the sampler, our approach is more efficient than traditional MCMCDA techniques. Moreover, our holistic formulation is able to generate longer, more reliable trajectories than baseline tracking-by-detection approaches in challenging multi-target scenarios.
computer vision and pattern recognition | 2013
Alina Bialkowski; Patrick Lucey; Peter Carr; Simon Denman; Iain A. Matthews; Sridha Sridharan
Recently, vision-based systems have been deployed in professional sports to track the ball and players to enhance analysis of matches. Due to their unobtrusive nature, vision-based approaches are preferred to wearable sensors (e.g. GPS or RFID sensors) as it does not require players or balls to be instrumented prior to matches. Unfortunately, in continuous team sports where players need to be tracked continuously over long-periods of time (e.g. 35 minutes in field-hockey or 45 minutes in soccer), current vision-based tracking approaches are not reliable enough to provide fully automatic solutions. As such, human intervention is required to fix-up missed or false detections. However, in instances where a human can not intervene due to the sheer amount of data being generated - this data can not be used due to the missing/noisy data. In this paper, we investigate two representations based on raw player detections (and not tracking) which are immune to missed and false detections. Specifically, we show that both team occupancy maps and centroids can be used to detect team activities, while the occupancy maps can be used to retrieve specific team activities. An evaluation on over 8 hours of field hockey data captured at a recent international tournament demonstrates the validity of the proposed approach.
workshop on applications of computer vision | 2015
Jianhui Chen; Peter Carr
Filming team sports is challenging because there are many points of interest which are constantly changing. Unlike previous automatic broadcasting solutions, we propose a data-driven approach for determining where a robotic pan-tilt-zoom (PTZ) camera should look. Without using any pre-defined heuristics, we learn the relationship between player locations and corresponding camera configurations by crafting features which can be derived from noisy player tracking data, and employ a new calibration algorithm to estimate the pan-tilt-zoom configuration of a human operated broadcast camera at each video frame. Using this data, we train a regress or to predict the appropriate pan angle for new noisy input tracking data. We demonstrate our system on a high school basketball game. Our experiments show how our data-driven planning approach achieves superior performance to a state-of-the-art algorithm and does indeed mimic a human operator.
international conference on multimedia and expo | 2013
Christine Chen; Oliver Wang; Simon Heinzle; Peter Carr; Aljoscha Smolic; Markus H. Gross
Live sports broadcast is seeing a large increase in the number of cameras used for filming. More cameras can provide better coverage of the field and a wider range of experiences for viewers. However, choosing optimal cameras for broadcast demands a high level of concentration, awareness and experience from sports broadcast directors. We present an automatic assistant to help select likely candidates from a large array of possible cameras. Sports directors can then choose the final broadcast camera from the reduced suggestion set. Our assistant uses both widely acknowledged cinematography guidelines for sports directing, as well as a data-driven approach that learns specific styles from directors.
acm multimedia | 2013
Eric Foote; Peter Carr; Patrick Lucey; Yaser Sheikh; Iain A. Matthews
Generating live broadcasts of sporting events requires a coordinated crew of camera operators, directors, and technical personnel to control and switch between multiple cameras to tell the evolving story of a game. In this paper, we present an unimodal interface concept that allows one person to cover live sporting action by controlling multiple cameras and and determining which view to broadcast. The interface exploits the structure of sports broadcasts which typically switch between a zoomed out game-camera view (which records the strategic team-level play), and a zoomed in iso-camera view (which captures the animated adversarial relations between opposing players). The operator simultaneously controls multiple pan-tilt-zoom cameras by pointing at a location on the touch screen, and selects which camera to broadcast using one or two points of contact. The image from the selected camera is superimposed on top of a wide-angle view captured from a context-camera which provides the operator with periphery information (which is useful for ensuring good framing while controlling the camera). We show that by unifying directorial and camera operation functions, we can achieve comparable broadcast quality to a multi-person crew, while reducing cost, logistical, and communication complexities.
computer vision and pattern recognition | 2017
Slawomir Bak; Peter Carr
Re-identification of people in surveillance footage must cope with drastic variations in color, background, viewing angle and a persons pose. Supervised techniques are often the most effective, but require extensive annotation which is infeasible for large camera networks. Unlike previous supervised learning approaches that require hundreds of annotated subjects, we learn a metric using a novel one-shot learning approach. We first learn a deep texture representation from intensity images with Convolutional Neural Networks (CNNs). When training a CNN using only intensity images, the learned embedding is color-invariant and shows high performance even on unseen datasets without fine-tuning. To account for differences in camera color distributions, we learn a color metric using a single pair of ColorChecker images. The proposed one-shot learning achieves performance that is competitive with supervised methods, but uses only a single example rather than the hundreds required for the fully supervised case. Compared with semi-supervised and unsupervised state-of-the-art methods, our approach yields significantly higher accuracy.