Is this you? Create Your Porfile

Stefan Lee

Georgia Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stefan Lee is active.

Explore More

Publication

Featured researches published by Stefan Lee.

international conference on computer vision | 2015

Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions

Sven Bambach; Stefan Lee; David J. Crandall; Chen Yu

Hands appear very often in egocentric video, and their appearance and pose give important cues about what people are doing and what they are paying attention to. But existing work in hand detection has made strong assumptions that work well in only simple scenarios, such as with limited interaction with other people or in lab settings. We develop methods to locate and distinguish between hands in egocentric video using strong appearance models with Convolutional Neural Networks, and introduce a simple candidate region generation approach that outperforms existing techniques at a fraction of the computational cost. We show how these high-quality bounding boxes can be used to create accurate pixelwise hand regions, and as an application, we investigate the extent to which hand segmentation alone can distinguish between different activities. We evaluate these techniques on a new dataset of 48 first-person videos of people interacting in realistic environments, with pixel-level ground truth for over 15,000 hand instances.

computer vision and pattern recognition | 2014

This Hand Is My Hand: A Probabilistic Approach to Hand Disambiguation in Egocentric Video

Stefan Lee; Sven Bambach; David J. Crandall; John M. Franchak; Chen Yu

Egocentric cameras are becoming more popular, introducing increasing volumes of video in which the biases and framing of traditional photography are replaced with those of natural viewing tendencies. This paradigm enables new applications, including novel studies of social interaction and human development. Recent work has focused on identifying the camera wearers hands as a first step towards more complex analysis. In this paper, we study how to disambiguate and track not only the observers hands but also those of social partners. We present a probabilistic framework for modeling paired interactions that incorporates the spatial, temporal, and appearance constraints inherent in egocentric video. We test our approach on a dataset of over 30 minutes of video from six pairs of subjects.

workshop on applications of computer vision | 2015

Predicting Geo-informative Attributes in Large-Scale Image Collections Using Convolutional Neural Networks

Stefan Lee; Haipeng Zhang; David J. Crandall

Geographic location is a powerful property for organizing large-scale photo collections, but only a small fraction of online photos are geo-tagged. Most work in automatically estimating geo-tags from image content is based on comparison against models of buildings or landmarks, or on matching to large reference collections of geotagged images. These approaches work well for frequently photographed places like major cities and tourist destinations, but fail for photos taken in sparsely photographed places where few reference photos exist. Here we consider how to recognize general geo-informative attributes of a photo, e.g. the elevation gradient, population density, demographics, etc. of where it was taken, instead of trying to estimate a precise geo-tag. We learn models for these attributes using a large (noisy) set of geo-tagged images from Flickr by training deep convolutional neural networks (CNNs). We evaluate on over a dozen attributes, showing that while automatically recognizing some attributes is very difficult, others can be automatically estimated with about the same accuracy as a human.

international conference on computational photography | 2015

Linking Past to Present: Discovering Style in Two Centuries of Architecture

Stefan Lee; Nicolas Maisonneuve; David J. Crandall; Alexei A. Efros; Josef Sivic

With vast quantities of imagery now available online, researchers have begun to explore whether visual patterns can be discovered automatically. Here we consider the particular domain of architecture, using huge collections of street-level imagery to find visual patterns that correspond to semantic-level architectural elements distinctive to particular time periods. We use this analysis both to date buildings, as well as to discover how functionally-similar architectural elements (e.g. windows, doors, balconies, etc.) have changed over time due to evolving styles. We validate the methods by combining a large dataset of nearly 150,000 Google Street View images from Paris with a cadastre map to infer approximate construction date for each facade. Not only could our analysis be used for dating or geo- localizing buildings based on architectural features, but it also could give architects and historians new tools for confirming known theories or even discovering new ones.

international conference on image processing | 2014

Estimating bedrock and surface layer boundaries and confidence intervals in ice sheet radar imagery using MCMC

Stefan Lee; Jerome E. Mitchell; David J. Crandall; Geoffrey C. Fox

Climate models that predict polar ice sheet behavior require accurate measurements of the bedrock-ice and ice-air boundaries in ground-penetrating radar imagery. Identifying these features is typically performed by hand, which can be tedious and error prone. We propose an approach for automatically estimating layer boundaries by viewing this task as a probabilistic inference problem. Our solution uses Markov-Chain Monte Carlo to sample from the joint distribution over all possible layers conditioned on an image. Layer boundaries can then be estimated from the expectation over this distribution, and confidence intervals can be estimated from the variance of the samples. We evaluate the method on 560 echograms collected in Antarctica, and compare to a state-of-the-art technique with respect to hand-labeled images. These experiments show an approximately 50% reduction in error for tracing both bedrock and surface layers.

Large-Scale Visual Geo-Localization | 2016

Recognizing Landmarks in Large-Scale Social Image Collections

David J. Crandall; Yunpeng Li; Stefan Lee; Daniel P. Huttenlocher

The dramatic growth of social media websites over the last few years has created huge collections of online images and raised new challenges in organizing them effectively. One particularly intuitive way of browsing and searching images is by the geo-spatial location of where on Earth they were taken, but most online images do not have GPS metadata associated with them. We consider the problem of recognizing popular landmarks in large-scale datasets of unconstrained consumer images by formulating a classification problem involving nearly 2 million images and 500 categories. The dataset and categories are formed automatically from geo-tagged photos from Flickr by looking for peaks in the spatial geo-tag distribution corresponding to frequently photographed landmarks. We learn models for these landmarks with a multiclass support vector machine, using classic vector-quantized interest point descriptors as features. We also incorporate the nonvisual metadata available on modern photo-sharing sites, showing that textual tags and temporal constraints lead to significant improvements in classification rate. Finally, we apply recent breakthroughs in deep learning with Convolutional Neural Networks, finding that these models can dramatically outperform the traditional recognition approaches to this problem, and even beat human observers in some cases. (This is an expanded and updated version of an earlier conference paper [23]).

computer vision and pattern recognition | 2017

Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning

Qing Sun; Stefan Lee; Dhruv Batra

We develop the first approximate inference algorithm for 1-Best (and M-Best) decoding in bidirectional neural sequence models by extending Beam Search (BS) to reason about both forward and backward time dependencies. Beam Search (BS) is a widely used approximate inference algorithm for decoding sequences from unidirectional neural sequence models. Interestingly, approximate inference in bidirectional models remains an open problem, despite their significant advantage in modeling information from both the past and future. To enable the use of bidirectional models, we present Bidirectional Beam Search (BiBS), an efficient algorithm for approximate bidirectional inference. To evaluate our method and as an interesting problem in its own right, we introduce a novel Fill-in-the-Blank Image Captioning task which requires reasoning about both past and future sentence structure to reconstruct sensible image descriptions. We use this task as well as the Visual Madlibs dataset to demonstrate the effectiveness of our approach, consistently outperforming all baseline methods.

international conference on computer vision | 2017