Scott Satkin
Carnegie Mellon University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Scott Satkin.
international conference on computer vision | 2007
Nathan Jacobs; Scott Satkin; Nathaniel Roman; Richard Speyer; Robert Pless
A key problem in widely distributed camera networks is locating the cameras. This paper considers three scenarios for camera localization: localizing a camera in an unknown environment, adding a new camera in a region with many other cameras, and localizing a camera by finding correlations with satellite imagery. We find that simple summary statistics (the time course of principal component coefficients) are sufficient to geolocate cameras without determining correspondences between cameras or explicitly reasoning about weather in the scene. We present results from a database of images from 538 cameras collected over the course of a year. We find that for cameras that remain stationary and for which we have accurate image times- tamps, we can localize most cameras to within 50 miles of the known location. In addition, we demonstrate the use of a distributed camera network in the construction a map of weather conditions.
european conference on computer vision | 2010
Scott Satkin; Martial Hebert
In this paper, we present a framework for estimating what portions of videos are most discriminative for the task of action recognition. We explore the impact of the temporal cropping of training videos on the overall accuracy of an action recognition system, and we formalize what makes a set of croppings optimal. In addition, we present an algorithm to determine the best set of croppings for a dataset, and experimentally show that our approach increases the accuracy of various state-of-the-art action recognition techniques.
british machine vision conference | 2012
Scott Satkin; Jason Lin; Martial Hebert
In this paper, we propose a data-driven approach to leverage repositories of 3D models for scene understanding. Our ability to relate what we see in an image to a large collection of 3D models allows us to transfer information from these models, creating a rich understanding of the scene. We develop a framework for auto-calibrating a camera, rendering 3D models from the viewpoint an image was taken, and computing a similarity measure between each 3D model and an input image. We demonstrate this data-driven approach in the context of geometry estimation and show the ability to find the identities and poses of object in a scene. Additionally, we present a new dataset with annotated scene geometry. This data allows us to measure the performance of our algorithm in 3D, rather than in the image plane. Recently, large online repositories of 3D data such as Google 3D Warehouse have emerged. These resources, as well as the advent of low-cost depth cameras, have sparked interest in geometric data-driven algorithms. At the same time, researchers have (re-)started investigating the feasibility of recovering geometric information, e.g., the layout of a scene. The success of data-driven techniques for tasks based on appearance features, e.g., interpreting an input image by retrieving similar scenes, suggests that similar techniques based on geometric data could be equally effective for 3D scene interpretation tasks. In fact, the motivation for data-driven techniques is the same for 3D models as for images: realworld environments are not random; the sizes, shapes, orientations, locations and co-location of objects are constrained in complicated ways that can be represented given enough data. In principle, estimating 3D scene structure from data would help constrain bottom-up vision processes. For example, in Figure 1, one nightstand is fully visible; however, the second nightstand is almost fully occluded. Although a bottom-up detector would likely fail to identify the second nightstand since only a few pixels are visible, our method of finding the best matching 3D model is able to detect these types of occluded objects. This is not a trivial extension of the image-based techniques. Generalizing data-driven ideas raises new fundamental technical questions never addressed before in this context: What features should be used to compare input images and 3D models? Given these features, what mechanism should be used to rank the most similar 3D models to the input scene? Even assuming that this ranking is correct, how can we transfer information from the 3D models to the input image? To address these questions, we develop a set of features that can be used to compare an input image with a 3D model and design a mechanism for finding the best matching 3D scene using support vector ranking. We show the feasibility of these techniques for transferring the geometry of objects in indoor scenes from 3D models to an input image. Naturally, we cannot compare 3D models directly to a 2D image. Thus, we first estimate the intrinsic and extrinsic parameters of the camera and use this information to render each of the 3D models from the same view as the image was taken from. We then compute similarity features between the models and the input image. Lastly, each of the 3D models is ranked based on how similar its rendering is to the input image using a learned feature weighting. See Figure 2 for an overview of this process. Please read our full paper for a detailed explaination of our data-driven geometry estimation algorithm and results.
international conference on computer vision | 2013
Scott Satkin; Martial Hebert
We present a new algorithm 3DNN (3D Nearest-Neighbor), which is capable of matching an image with 3D data, independently of the viewpoint from which the image was captured. By leveraging rich annotations associated with each image, our algorithm can automatically produce precise and detailed 3D models of a scene from a single image. Moreover, we can transfer information across images to accurately label and segment objects in a scene. The true benefit of 3DNN compared to a traditional 2D nearest-neighbor approach is that by generalizing across viewpoints, we free ourselves from the need to have training examples captured from all possible viewpoints. Thus, we are able to achieve comparable results using orders of magnitude less data, and recognize objects from never-before-seen viewpoints. In this work, we describe the 3DNN algorithm and rigorously evaluate its performance for the tasks of geometry estimation and object detection/segmentation. By decoupling the viewpoint and the geometry of an image, we develop a scene matching approach which is truly 100% viewpoint invariant, yielding state-of-the-art performance on challenging data.
international conference on data engineering | 2011
K.V.M. Naidu; Rajeev Rastogi; Scott Satkin; Anand Srinivasan
In this paper, we study the problem of efficiently computing multiple aggregation queries over a data stream. In order to share computation, prior proposals have suggested instantiating certain intermediate aggregates which are then used to generate the final answers for input queries. In this work, we make a number of important contributions aimed at improving the execution and generation of query plans containing intermediate aggregates. These include: (1) a different hashing model, which has low eviction rates, and also allows us to accurately estimate the number of evictions, (2) a comprehensive query execution cost model based on these estimates, (3) an efficient greedy heuristic for constructing good low-cost query plans, (4) provably near-optimal and optimal algorithms for allocating the available memory to aggregates in the query plan when the input data distribution is Zipf-like and Uniform, respectively, and (5) a detailed performance study with real-life IP flow data sets, which show that our multiple aggregates computation techniques consistently outperform the best-known approach.
International Journal of Computer Vision | 2015
Scott Satkin; Maheen Rashid; Jason Lin; Martial Hebert
In this paper, we describe a data-driven approach to leverage repositories of 3D models for scene understanding. Our ability to relate what we see in an image to a large collection of 3D models allows us to transfer information from these models, creating a rich understanding of the scene. We develop a framework for auto-calibrating a camera, rendering 3D models from the viewpoint an image was taken, and computing a similarity measure between each 3D model and an input image. We demonstrate this data-driven approach in the context of geometry estimation and show the ability to find the identities, poses and styles of objects in a scene. The true benefit of 3DNN compared to a traditional 2D nearest-neighbor approach is that by generalizing across viewpoints, we free ourselves from the need to have training examples captured from all possible viewpoints. Thus, we are able to achieve comparable results using orders of magnitude less data, and recognize objects from never-before-seen viewpoints. In this work, we describe the 3DNN algorithm and rigorously evaluate its performance for the tasks of geometry estimation and object detection/segmentation, as well as two novel applications: affordance estimation and photorealistic object insertion.
international conference on computer vision | 2009
Nathan Jacobs; Michael Dixon; Scott Satkin; Robert Pless
We consider the special case of tracking objects in highly structured scenes. In the context of vehicle tracking in urban environments, we offer a fully automatic, end-to-end system that discovers and parametrizes the lanes along which vehicles drive, then uses just these pixels to simultaneously track dozens of objects. This system includes a novel active contour energy function used to parametrize the lanes of travel based only on the accumulation of spatio-temporal image derivatives, and a tracking algorithm that exploits longer temporal constraints made possible by our compact data representation; we believe both of these may be of independent interest. We offer quantitative results comparing tracking results to ground-truthed data, including thousands of vehicles from the NGSIM Peachtree data set.
international conference on computer graphics and interactive techniques | 2012
Mark Colbert; Jean-Yves Bouguet; Jeff Beis; Spudde Childs; Daniel Joseph Filip; Luc Vincent; Jongwoo Lim; Scott Satkin
Google Street View has provided millions of users with the ability to visually locate businesses around the world using 360° panoramic imagery. Due to the bulky, custom hardware required to precisely geo-locate the imagery, such as laser scanners and high precision GPS devices, Street View experiences have been limited to large areas that can provide cost-effective collections. This has prevented users from discovering places such as the interiors of small businesses.
computer vision and pattern recognition | 2011
Abhinav Gupta; Scott Satkin; Alexei A. Efros; Martial Hebert
Archive | 2015
Scott Satkin