David F. Fouhey | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David F. Fouhey is active.

Explore More

Publication

Featured researches published by David F. Fouhey.

computer vision and pattern recognition | 2015

Designing deep networks for surface normal estimation

Xiaolong Wang; David F. Fouhey; Abhinav Gupta

In the past few years, convolutional neural nets (CNN) have shown incredible promise for learning visual representations. In this paper, we use CNNs for the task of predicting surface normals from a single image. But what is the right architecture? We propose to build upon the decades of hard work in 3D scene understanding to design a new CNN architecture for the task of surface normal estimation. We show that incorporating several constraints (man-made, Manhattan world) and meaningful intermediate representations (room layout, edge labels) in the architecture leads to state of the art performance on surface normal estimation. We also show that our network is quite robust and show state of the art results on other datasets as well without any fine-tuning.

european conference on computer vision | 2016

Learning a Predictable and Generative Vector Representation for Objects

Rohit Girdhar; David F. Fouhey; Mikel Rodriguez; Abhinav Gupta

What is a good vector representation of an object? We believe that it should be generative in 3D, in the sense that it can produce new 3D objects; as well as be predictable from 2D, in the sense that it can be perceived from 2D images. We propose a novel architecture, called the TL-embedding network, to learn an embedding space with these properties. The network consists of two components: (a) an autoencoder that ensures the representation is generative; and (b) a convolutional network that ensures the representation is predictable. This enables tackling a number of tasks including voxel prediction from 2D images and 3D model retrieval. Extensive experimental analysis demonstrates the usefulness and versatility of this embedding.

international conference on computer vision | 2013

Data-Driven 3D Primitives for Single Image Understanding

David F. Fouhey; Abhinav Gupta; Martial Hebert

What primitives should we use to infer the rich 3D world behind an image? We argue that these primitives should be both visually discriminative and geometrically informative and we present a technique for discovering such primitives. We demonstrate the utility of our primitives by using them to infer 3D surface normals given a single image. Our technique substantially outperforms the state-of-the-art and shows improved cross-dataset performance.

International Journal of Computer Vision | 2014

People Watching: Human Actions as a Cue for Single View Geometry

David F. Fouhey; Vincent Delaitre; Abhinav Gupta; Alexei A. Efros; Ivan Laptev; Josef Sivic

We present an approach which exploits the coupling between human actions and scene geometry to use human pose as a cue for single-view 3D scene understanding. Our method builds upon recent advances in still-image pose estimation to extract functional and geometric constraints on the scene. These constraints are then used to improve single-view 3D scene understanding approaches. The proposed method is validated on monocular time-lapse sequences from YouTube and still images of indoor scenes gathered from the Internet. We demonstrate that observing people performing different actions can significantly improve estimates of 3D scene geometry.

european conference on computer vision | 2012

Scene semantics from long-term observation of people

Vincent Delaitre; David F. Fouhey; Ivan Laptev; Josef Sivic; Abhinav Gupta; Alexei A. Efros

Our everyday objects support various tasks and can be used by people for different purposes. While object classification is a widely studied topic in computer vision, recognition of object function, i.e., what people can do with an object and how they do it, is rarely addressed. In this paper we construct a functional object description with the aim to recognize objects by the way people interact with them. We describe scene objects (sofas, tables, chairs) by associated human poses and object appearance. Our model is learned discriminatively from automatically estimated body poses in many realistic scenes. In particular, we make use of time-lapse videos from YouTube providing a rich source of common human-object interactions and minimizing the effort of manual object annotation. We show how the models learned from human observations significantly improve object recognition and enable prediction of characteristic human poses in new scenes. Results are shown on a dataset of more than 400,000 frames obtained from 146 time-lapse videos of challenging and realistic indoor scenes.

european conference on computer vision | 2012

People watching: human actions as a cue for single view geometry

David F. Fouhey; Vincent Delaitre; Abhinav Gupta; Alexei A. Efros; Ivan Laptev; Josef Sivic

We present an approach which exploits the coupling between human actions and scene geometry. We investigate the use of human pose as a cue for single-view 3D scene understanding. Our method builds upon recent advances in still-image pose estimation to extract functional and geometric constraints about the scene. These constraints are then used to improve state-of-the-art single-view 3D scene understanding approaches. The proposed method is validated on a collection of monocular time-lapse sequences collected from YouTube and a dataset of still images of indoor scenes. We demonstrate that observing people performing different actions can significantly improve estimates of 3D scene geometry.

computer vision and pattern recognition | 2014

Predicting Object Dynamics in Scenes

David F. Fouhey; C. Lawrence Zitnick

Given a static scene, a human can trivially enumerate the myriad of things that can happen next and characterize the relative likelihood of each. In the process, we make use of enormous amounts of commonsense knowledge about how the world works. In this paper, we investigate learning this commonsense knowledge from data. To overcome a lack of densely annotated spatiotemporal data, we learn from sequences of abstract images gathered using crowd-sourcing. The abstract scenes provide both object location and attribute information. We demonstrate qualitatively and quantitatively that our models produce plausible scene predictions on both the abstract images, as well as natural images taken from the Internet.

european conference on computer vision | 2014

Unfolding an Indoor Origami World

David F. Fouhey; Abhinav Gupta; Martial Hebert

In this work, we present a method for single-view reasoning about 3D surfaces and their relationships. We propose the use of mid-level constraints for 3D scene understanding in the form of convex and concave edges and introduce a generic framework capable of incorporating these and other constraints. Our method takes a variety of cues and uses them to infer a consistent interpretation of the scene. We demonstrate improvements over the state-of-the art and produce interpretations of the scene that link large planar surfaces.

international conference on pattern recognition | 2010

Multiple Plane Detection in Image Pairs Using J-Linkage

David F. Fouhey; Daniel Scharstein; Amy J. Briggs

We present a new method for the robust detection and matching of multiple planes in pairs of images. Such planes can serve as stable landmarks for vision-based urban navigation. Our approach starts from SIFT matches and generates multiple local homography hypotheses using the recent J-linkage technique by Toldo and Fusiello, a robust randomized multi-model estimation algorithm. These hypotheses are then globally merged, spatially analyzed, robustly fitted, and checked for stability. When tested on more than 30,000 image pairs taken from panoramic views of a college campus, our method yields no false positives and recovers 72% of the matchable building walls identified by a human, despite significant occlusions and viewpoint changes.

international conference on computer vision | 2015

Single Image 3D without a Single 3D Image

David F. Fouhey; Wajahat Hussain; Abhinav Gupta; Martial Hebert

Do we really need 3D labels in order to learn how to predict 3D? In this paper, we show that one can learn a mapping from appearance to 3D properties without ever seeing a single explicit 3D label. Rather than use explicit supervision, we use the regularity of indoor scenes to learn the mapping in a completely unsupervised manner. We demonstrate this on both a standard 3D scene understanding dataset as well as Internet images for which 3D is unavailable, precluding supervised learning. Despite never seeing a 3D label, our method produces competitive results.

Explore More