Piotr Dollár | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Piotr Dollár is active.

Explore More

Publication

Featured researches published by Piotr Dollár.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2012

Pedestrian Detection: An Evaluation of the State of the Art

Piotr Dollár; Christian Wojek; Bernt Schiele; Pietro Perona

Pedestrian detection is a key problem in computer vision, with several applications that have the potential to positively impact quality of life. In recent years, the number of approaches to detecting pedestrians in monocular images has grown steadily. However, multiple data sets and widely varying evaluation protocols are used, making direct comparisons difficult. To address these shortcomings, we perform an extensive evaluation of the state of the art in a unified framework. We make three primary contributions: 1) We put together a large, well-annotated, and realistic monocular pedestrian detection data set and study the statistics of the size, position, and occlusion patterns of pedestrians in urban scenes, 2) we propose a refined per-frame evaluation methodology that allows us to carry out probing and informative comparisons, including measuring performance in relation to scale and occlusion, and 3) we evaluate the performance of sixteen pretrained state-of-the-art detectors across six data sets. Our study allows us to assess the state of the art and provides a framework for gauging future efforts. Our experiments show that despite significant progress, performance still has much room for improvement. In particular, detection is disappointing at low resolutions and for partially occluded pedestrians.

european conference on computer vision | 2014

Edge Boxes: Locating Object Proposals from Edges

C. Lawrence Zitnick; Piotr Dollár

The use of object proposals is an effective recent approach for increasing the computational efficiency of object detection. We propose a novel method for generating object bounding box proposals using edges. Edges provide a sparse yet informative representation of an image. Our main observation is that the number of contours that are wholly contained in a bounding box is indicative of the likelihood of the box containing an object. We propose a simple box objectness score that measures the number of edges that exist in the box minus those that are members of contours that overlap the box’s boundary. Using efficient data structures, millions of candidate boxes can be evaluated in a fraction of a second, returning a ranked set of a few thousand top-scoring proposals. Using standard metrics, we show results that are significantly more accurate than the current state-of-the-art while being faster to compute. In particular, given just 1000 proposals we achieve over 96% object recall at overlap threshold of 0.5 and over 75% recall at the more challenging overlap of 0.7. Our approach runs in 0.25 seconds and we additionally demonstrate a near real-time variant with only minor loss in accuracy.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2014

Fast Feature Pyramids for Object Detection

Piotr Dollár; Ron Appel; Serge J. Belongie; Pietro Perona

Multi-resolution image features may be approximated via extrapolation from nearby scales, rather than being computed explicitly. This fundamental insight allows us to design object detection algorithms that are as accurate, and considerably faster, than the state-of-the-art. The computational bottleneck of many modern detectors is the computation of features at every scale of a finely-sampled image pyramid. Our key insight is that one may compute finely sampled feature pyramids at a fraction of the cost, without sacrificing performance: for a broad family of features we find that features computed at octave-spaced scale intervals are sufficient to approximate features on a finely-sampled pyramid. Extrapolation is inexpensive as compared to direct feature computation. As a result, our approximation yields considerable speedups with negligible loss in detection accuracy. We modify three diverse visual recognition systems to use fast feature pyramids and show results on both pedestrian detection (measured on the Caltech, INRIA, TUD-Brussels and ETH data sets) and general object detection (measured on the PASCAL VOC). The approach is general and is widely applicable to vision algorithms requiring fine-grained multi-scale analysis. Our approximation is valid for images with broad spectra (most natural images) and fails for images with narrow band-pass spectra (e.g., periodic textures).

british machine vision conference | 2009

Integral Channel Features

Piotr Dollár; Zhuowen Tu; Pietro Perona; Serge J. Belongie

We study the performance of ‘integral channel features’ for image classification tasks, focusing in particular on pedestrian detection. The general idea behind integral channel features is that multiple registered image channels are computed using linear and non-linear transformations of the input image, and then features such as local sums, histograms, and Haar features and their various generalizations are efficiently computed using integral images. Such features have been used in recent literature for a variety of tasks – indeed, variations appear to have been invented independently multiple times. Although integral channel features have proven effective, little effort has been devoted to analyzing or optimizing the features themselves. In this work we present a unified view of the relevant work in this area and perform a detailed experimental evaluation. We demonstrate that when designed properly, integral channel features not only outperform other features including histogram of oriented gradient (HOG), they also (1) naturally integrate heterogeneous sources of information, (2) have few parameters and are insensitive to exact parameter settings, (3) allow for more accurate spatial localization during detection, and (4) result in fast detectors when coupled with cascade classifiers.

computer vision and pattern recognition | 2009

Pedestrian detection: A benchmark

Piotr Dollár; Christian Wojek; Bernt Schiele; Pietro Perona

Pedestrian detection is a key problem in computer vision, with several applications including robotics, surveillance and automotive safety. Much of the progress of the past few years has been driven by the availability of challenging public datasets. To continue the rapid rate of innovation, we introduce the Caltech Pedestrian Dataset, which is two orders of magnitude larger than existing datasets. The dataset contains richly annotated video, recorded from a moving vehicle, with challenging images of low resolution and frequently occluded people. We propose improved evaluation metrics, demonstrating that commonly used per-window measures are flawed and can fail to predict performance on full images. We also benchmark several promising detection systems, providing an overview of state-of-the-art performance and a direct, unbiased comparison of existing methods. Finally, by analyzing common failure cases, we help identify future research directions for the field.

british machine vision conference | 2010

The Fastest Pedestrian Detector in the West

Piotr Dollár; Serge J. Belongie; Pietro Perona

We demonstrate a multiscale pedestrian detector operating in near real time ( 6 fps on 640x480 images) with state-of-the-art detection performance. The computational bottleneck of many modern detectors is the construction of an image pyramid, typically sampled at 8-16 scales per octave, and associated feature computations at each scale. We propose a technique to avoid constructing such a finely sampled image pyramid without sacrificing performance: our key insight is that for a broad family of features, including gradient histograms, the feature responses computed at a single scale can be used to approximate feature responses at nearby scales. The approximation is accurate within an entire scale octave. This allows us to decouple the sampling of the image pyramid from the sampling of detection scales. Overall, our approximation yields a speedup of 10-100 times over competing methods with only a minor loss in detection accuracy of about 1-2% on the Caltech Pedestrian dataset across a wide range of evaluation settings. The results are confirmed on three additional datasets (INRIA, ETH, and TUD-Brussels) where our method always scores within a few percent of the state-of-the-art while being 1-2 orders of magnitude faster. The approach is general and should be widely applicable.

computer vision and pattern recognition | 2015

From captions to visual concepts and back

Hao Fang; Saurabh Gupta; Forrest N. Iandola; Rupesh Kumar Srivastava; Li Deng; Piotr Dollár; Jianfeng Gao; Xiaodong He; Margaret Mitchell; John Platt; C. Lawrence Zitnick; Geoffrey Zweig

This paper presents a novel approach for automatically generating image descriptions: visual detectors, language models, and multimodal similarity models learnt directly from a dataset of image captions. We use multiple instance learning to train visual detectors for words that commonly occur in captions, including many different parts of speech such as nouns, verbs, and adjectives. The word detector outputs serve as conditional inputs to a maximum-entropy language model. The language model learns from a set of over 400,000 image descriptions to capture the statistics of word usage. We capture global semantics by re-ranking caption candidates using sentence-level features and a deep multimodal similarity model. Our system is state-of-the-art on the official Microsoft COCO benchmark, producing a BLEU-4 score of 29.1%. When human judges compare the system captions to ones written by other people on our held-out test set, the system captions have equal or better quality 34% of the time.

Nature | 2011

Functional identification of an aggression locus in the mouse hypothalamus

Dayu Lin; Maureen P. Boyle; Piotr Dollár; Hyosang Lee; E. S. Lein; Pietro Perona; David J. Anderson

Electrical stimulation of certain hypothalamic regions in cats and rodents can elicit attack behaviour, but the exact location of relevant cells within these regions, their requirement for naturally occurring aggression and their relationship to mating circuits have not been clear. Genetic methods for neural circuit manipulation in mice provide a potentially powerful approach to this problem, but brain-stimulation-evoked aggression has never been demonstrated in this species. Here we show that optogenetic, but not electrical, stimulation of neurons in the ventromedial hypothalamus, ventrolateral subdivision (VMHvl) causes male mice to attack both females and inanimate objects, as well as males. Pharmacogenetic silencing of VMHvl reversibly inhibits inter-male aggression. Immediate early gene analysis and single unit recordings from VMHvl during social interactions reveal overlapping but distinct neuronal subpopulations involved in fighting and mating. Neurons activated during attack are inhibited during mating, suggesting a potential neural substrate for competition between these opponent social behaviours.

international conference on computer vision | 2013

Robust Face Landmark Estimation under Occlusion

Xavier P. Burgos-Artizzu; Pietro Perona; Piotr Dollár

Human faces captured in real-world conditions present large variations in shape and occlusions due to differences in pose, expression, use of accessories such as sunglasses and hats and interactions with objects (e.g. food). Current face landmark estimation approaches struggle under such conditions since they fail to provide a principled way of handling outliers. We propose a novel method, called Robust Cascaded Pose Regression (RCPR) which reduces exposure to outliers by detecting occlusions explicitly and using robust shape-indexed features. We show that RCPR improves on previous landmark estimation methods on three popular face datasets (LFPW, LFW and HELEN). We further explore RCPRs performance by introducing a novel face dataset focused on occlusion, composed of 1,007 faces presenting a wide range of occlusion patterns. RCPR reduces failure cases by half on all four datasets, at the same time as it detects face occlusions with a 80/40% precision/recall.

computer vision and pattern recognition | 2010

Cascaded pose regression

Piotr Dollár; Peter Welinder; Pietro Perona

We present a fast and accurate algorithm for computing the 2D pose of objects in images called cascaded pose regression (CPR). CPR progressively refines a loosely specified initial guess, where each refinement is carried out by a different regressor. Each regressor performs simple image measurements that are dependent on the output of the previous regressors; the entire system is automatically learned from human annotated training examples. CPR is not restricted to rigid transformations: ‘pose’ is any parameterized variation of the objects appearance such as the degrees of freedom of deformable and articulated objects. We compare CPR against both standard regression techniques and human performance (computed from redundant human annotations). Experiments on three diverse datasets (mice, faces, fish) suggest CPR is fast (2–3ms per pose estimate), accurate (approaching human performance), and easy to train from small amounts of labeled data.

Explore More