Fisher Yu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fisher Yu is active.

Explore More

Publication

Featured researches published by Fisher Yu.

computer vision and pattern recognition | 2015

3D ShapeNets: A deep representation for volumetric shapes

Zhirong Wu; Shuran Song; Aditya Khosla; Fisher Yu; Linguang Zhang; Xiaoou Tang; Jianxiong Xiao

3D shape is a crucial but heavily underutilized cue in todays computer vision systems, mostly due to the lack of a good generic shape representation. With the recent availability of inexpensive 2.5D depth sensors (e.g. Microsoft Kinect), it is becoming increasingly important to have a powerful 3D shape representation in the loop. Apart from category recognition, recovering full 3D shapes from view-based 2.5D depth maps is also a critical part of visual understanding. To this end, we propose to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid, using a Convolutional Deep Belief Network. Our model, 3D ShapeNets, learns the distribution of complex 3D shapes across different object categories and arbitrary poses from raw CAD data, and discovers hierarchical compositional part representation automatically. It naturally supports joint object recognition and shape completion from 2.5D depth maps, and it enables active object recognition through view planning. To train our 3D deep learning model, we construct ModelNet - a large-scale 3D CAD model dataset. Extensive experiments show that our 3D deep representation enables significant performance improvement over the-state-of-the-arts in a variety of tasks.

computer vision and pattern recognition | 2017

Semantic Scene Completion from a Single Depth Image

Shuran Song; Fisher Yu; Andy Zeng; Angel X. Chang; Manolis Savva; Thomas A. Funkhouser

This paper focuses on semantic scene completion, a task for producing a complete 3D voxel representation of volumetric occupancy and semantic labels for a scene from a single-view depth map observation. Previous work has considered scene completion and semantic labeling of depth maps separately. However, we observe that these two problems are tightly intertwined. To leverage the coupled nature of these two tasks, we introduce the semantic scene completion network (SSCNet), an end-to-end 3D convolutional network that takes a single depth image as input and simultaneously outputs occupancy and semantic labels for all voxels in the camera view frustum. Our network uses a dilation-based 3D context module to efficiently expand the receptive field and enable 3D context learning. To train our network, we construct SUNCG - a manually created largescale dataset of synthetic 3D scenes with dense volumetric annotations. Our experiments demonstrate that the joint model outperforms methods addressing each task in isolation and outperforms alternative approaches on the semantic scene completion task. The dataset and code is available at http://sscnet.cs.princeton.edu.

computer vision and pattern recognition | 2017

Dilated Residual Networks

Fisher Yu; Vladlen Koltun; Thomas A. Funkhouser

Convolutional networks for image classification progressively reduce resolution until the image is represented by tiny feature maps in which the spatial structure of the scene is no longer discernible. Such loss of spatial acuity can limit image classification accuracy and complicate the transfer of the model to downstream applications that require detailed scene understanding. These problems can be alleviated by dilation, which increases the resolution of output feature maps without reducing the receptive field of individual neurons. We show that dilated residual networks (DRNs) outperform their non-dilated counterparts in image classification without increasing the models depth or complexity. We then study gridding artifacts introduced by dilation, develop an approach to removing these artifacts (degridding), and show that this further increases the performance of DRNs. In addition, we show that the accuracy advantage of DRNs is further magnified in downstream applications such as object localization and semantic segmentation.

computer vision and pattern recognition | 2017

Scribbler: Controlling Deep Image Synthesis with Sketch and Color

Patsorn Sangkloy; Jingwan Lu; Chen Fang; Fisher Yu; James Hays

Several recent works have used deep convolutional networks to generate realistic imagery. These methods sidestep the traditional computer graphics rendering pipeline and instead generate imagery at the pixel level by learning from large collections of photos (e.g. faces or bedrooms). However, these methods are of limited utility because it is difficult for a user to control what the network produces. In this paper, we propose a deep adversarial image synthesis architecture that is conditioned on sketched boundaries and sparse color strokes to generate realistic cars, bedrooms, or faces. We demonstrate a sketch based image synthesis system which allows users to scribble over the sketch to indicate preferred color for objects. Our network can then generate convincing images that satisfy both the color and the sketch constraints of user. The network is feed-forward which allows users to see the effect of their edits in real time. We compare to recent work on sketch to image synthesis and show that our approach generates more realistic, diverse, and controllable outputs. The architecture is also effective at user-guided colorization of grayscale images.

computer vision and pattern recognition | 2017

End-to-End Learning of Driving Models from Large-Scale Video Datasets

Huazhe Xu; Yang Gao; Fisher Yu; Trevor Darrell

Robust perception-action models should be learned from training data with diverse visual appearances and realistic behaviors, yet current approaches to deep visuomotor policy learning have been generally limited to in-situ models learned from a single vehicle or simulation environment. We advocate learning a generic vehicle motion model from large scale crowd-sourced video data, and develop an end-to-end trainable architecture for learning to predict a distribution over future vehicle egomotion from instantaneous monocular camera observations and previous vehicle state. Our model incorporates a novel FCN-LSTM architecture, which can be learned from large-scale crowd-sourced vehicle action data, and leverages available scene segmentation side tasks to improve performance under a privileged learning paradigm. We provide a novel large-scale dataset of crowd-sourced driving behavior suitable for training our model, and report results predicting the driver action on held out sequences across diverse conditions.

international conference on computer graphics and interactive techniques | 2012

HelpingHand: example-based stroke stylization

Jingwan Lu; Fisher Yu; Adam Finkelstein; Stephen DiVerdi

Digital painters commonly use a tablet and stylus to drive software like Adobe Photoshop. A high quality stylus with 6 degrees of freedom (DOFs: 2D position, pressure, 2D tilt, and 1D rotation) coupled to a virtual brush simulation engine allows skilled users to produce expressive strokes in their own style. However, such devices are difficult for novices to control, and many people draw with less expensive (lower DOF) input devices. This paper presents a data-driven approach for synthesizing the 6D hand gesture data for users of low-quality input devices. Offline, we collect a library of strokes with 6D data created by trained artists. Online, given a query stroke as a series of 2D positions, we synthesize the 4D hand pose data at each sample based on samples from the library that locally match the query. This framework optionally can also modify the stroke trajectory to match characteristic shapes in the style of the library. Our algorithm outputs a 6D trajectory that can be fed into any virtual brush stroke engine to make expressive strokes for novices or users of limited hardware.

computer vision and pattern recognition | 2015

Semantic alignment of LiDAR data at city scale

Fisher Yu; Jianxiong Xiao; Thomas A. Funkhouser

This paper describes an automatic algorithm for global alignment of LiDAR data collected with Google Street View cars in urban environments. The problem is challenging because global pose estimation techniques (GPS) do not work well in city environments with tall buildings, and local tracking techniques (integration of inertial sensors, structure-from-motion, etc.) provide solutions that drift over long ranges, leading to solutions where data collected over wide ranges is warped and misaligned by many meters. Our approach to address this problem is to extract “semantic features” with object detectors (e.g., for facades, poles, cars, etc.) that can be matched robustly at different scales, and thus are selected for different iterations of an ICP algorithm. We have implemented an all-to-all, non-rigid, global alignment based on this idea that provides better results than alternatives during experiments with data from large regions of New York, San Francisco, Paris, and Rome.

international conference on computer graphics and interactive techniques | 2016

Automatic triage for a photo series

Huiwen Chang; Fisher Yu; Jue Wang; Douglas Ashley; Adam Finkelstein

People often take a series of nearly redundant pictures to capture a moment or scene. However, selecting photos to keep or share from a large collection is a painful chore. To address this problem, we seek a relative quality measure within a series of photos taken of the same scene, which can be used for automatic photo triage. Towards this end, we gather a large dataset comprised of photo series distilled from personal photo albums. The dataset contains 15, 545 unedited photos organized in 5,953 series. By augmenting this dataset with ground truth human preferences among photos within each series, we establish a benchmark for measuring the effectiveness of algorithmic models of how people select photos. We introduce several new approaches for modeling human preference based on machine learning. We also describe applications for the dataset and predictor, including a smart album viewer, automatic photo enhancement, and providing overviews of video clips.

european conference on computer vision | 2018

SkipNet: Learning Dynamic Routing in Convolutional Networks

Xin Wang; Fisher Yu; Zi-Yi Dou; Trevor Darrell; Joseph E. Gonzalez

While deeper convolutional networks are needed to achieve maximum accuracy in visual perception tasks, for many inputs shallower networks are sufficient. We exploit this observation by learning to skip convolutional layers on a per-input basis. We introduce SkipNet, a modified residual network, that uses a gating network to selectively skip convolutional blocks based on the activations of the previous layer. We formulate the dynamic skipping problem in the context of sequential decision making and propose a hybrid learning algorithm that combines supervised learning and reinforcement learning to address the challenges of non-differentiable skipping decisions. We show SkipNet reduces computation by 30-90% while preserving the accuracy of the original model on four benchmark datasets and outperforms the state-of-the-art dynamic networks and static compression methods. We also qualitatively evaluate the gating policy to reveal a relationship between image scale and saliency and the number of layers skipped.

european conference on computer vision | 2018

Characterizing Adversarial Examples Based on Spatial Consistency Information for Semantic Segmentation

Chaowei Xiao; Ruizhi Deng; Bo Li; Fisher Yu; Mingyan Liu; Dawn Song

Deep Neural Networks (DNNs) have been widely applied in various recognition tasks. However, recently DNNs have been shown to be vulnerable against adversarial examples, which can mislead DNNs to make arbitrary incorrect predictions. While adversarial examples are well studied in classification tasks, other learning problems may have different properties. For instance, semantic segmentation requires additional components such as dilated convolutions and multiscale processing. In this paper, we aim to characterize adversarial examples based on spatial context information in semantic segmentation. We observe that spatial consistency information can be potentially leveraged to detect adversarial examples robustly even when a strong adaptive attacker has access to the model and detection strategies. We also show that adversarial examples based on attacks considered within the paper barely transfer among models, even though transferability is common in classification. Our observations shed new light on developing adversarial attacks and defenses to better understand the vulnerabilities of DNNs.

Explore More