Yangyan Li
Stanford University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yangyan Li.
international conference on computer vision | 2015
Hao Su; Charles Ruizhongtai Qi; Yangyan Li; Leonidas J. Guibas
Object viewpoint estimation from 2D images is an essential task in computer vision. However, two issues hinder its progress: scarcity of training data with viewpoint annotations, and a lack of powerful features. Inspired by the growing availability of 3D models, we propose a framework to address both issues by combining render-based image synthesis and CNNs (Convolutional Neural Networks). We believe that 3D models have the potential in generating a large number of images of high variation, which can be well exploited by deep CNN with a high learning capacity. Towards this goal, we propose a scalable and overfit-resistant image synthesis pipeline, together with a novel CNN specifically tailored for the viewpoint estimation task. Experimentally, we show that the viewpoint estimation from our pipeline can significantly outperform state-of-the-art methods on PASCAL 3D+ benchmark.
international conference on computer graphics and interactive techniques | 2011
Yangyan Li; Xiaokun Wu; Yiorgos Chrysathou; Andrei Sharf; Daniel Cohen-Or; Niloy J. Mitra
Given a noisy and incomplete point set, we introduce a method that simultaneously recovers a set of locally fitted primitives along with their global mutual relations. We operate under the assumption that the data corresponds to a man-made engineering object consisting of basic primitives, possibly repeated and globally aligned under common relations. We introduce an algorithm to directly couple the local and global aspects of the problem. The local fit of the model is determined by how well the inferred model agrees to the observed data, while the global relations are iteratively learned and enforced through a constrained optimization. Starting with a set of initial RANSAC based locally fitted primitives, relations across the primitives such as orientation, placement, and equality are progressively learned and conformed to. In each stage, a set of feasible relations are extracted among the candidate relations, and then aligned to, while best fitting to the input data. The global coupling corrects the primitives obtained in the local RANSAC stage, and brings them to precise global alignment. We test the robustness of our algorithm on a range of synthesized and scanned data, with varying amounts of noise, outliers, and non-uniform sampling, and validate the results against ground truth, where available.
international conference on computer graphics and interactive techniques | 2010
Qian Zheng; Andrei Sharf; Guowei Wan; Yangyan Li; Niloy J. Mitra; Daniel Cohen-Or; Baoquan Chen
Recent advances in scanning technologies, in particular devices that extract depth through active sensing, allow fast scanning of urban scenes. Such rapid acquisition incurs imperfections: large regions remain missing, significant variation in sampling density is common, and the data is often corrupted with noise and outliers. However, buildings often exhibit large scale repetitions and self-similarities. Detecting, extracting, and utilizing such large scale repetitions provide powerful means to consolidate the imperfect data. Our key observation is that the same geometry, when scanned multiple times over reoccurrences of instances, allow application of a simple yet effective non-local filtering. The multiplicity of the geometry is fused together and projected to a base-geometry defined by clustering corresponding surfaces. Denoising is applied by separating the process into off-plane and in-plane phases. We show that the consolidation of the reoccurrences provides robust denoising and allow reliable completion of missing parts. We present evaluation results of the algorithm on several LiDAR scans of buildings of varying complexity and styles.
international conference on 3d vision | 2016
Wenzheng Chen; Huan Wang; Yangyan Li; Hao Su; Zhenhua Wang; Changhe Tu; Dani Lischinski; Daniel Cohen-Or; Baoquan Chen
Human 3D pose estimation from a single image is a challenging task with numerous applications. Convolutional Neural Networks (CNNs) have recently achieved superior performance on the task of 2D pose estimation from a single image, by training on images with 2D annotations collected by crowd sourcing. This suggests that similar success could be achieved for direct estimation of 3D poses. However, 3D poses are much harder to annotate, and the lack of suitable annotated training images hinders attempts towards end-to-end solutions. To address this issue, we opt to automatically synthesize training images with ground truth pose annotations. Our work is a systematic study along this road. We find that pose space coverage and texture diversity are the key ingredients for the effectiveness of synthetic training data. We present a fully automatic, scalable approach that samples the human pose space for guiding the synthesis procedure and extracts clothing textures from real images. Furthermore, we explore domain adaptation for bridging the gap between our synthetic training images and real testing photos. We demonstrate that CNNs trained with our synthetic images out-perform those trained with real photos on 3D pose estimation tasks.
international conference on computer vision | 2011
Yangyan Li; Qian Zheng; Andrei Sharf; Daniel Cohen-Or; Baoquan Chen; Niloy J. Mitra
We present a method for fusing two acquisition modes, 2D photographs and 3D LiDAR scans, for depth-layer decomposition of urban facades. The two modes have complementary characteristics: point cloud scans are coherent and inherently 3D, but are often sparse, noisy, and incomplete; photographs, on the other hand, are of high resolution, easy to acquire, and dense, but view-dependent and inherently 2D, lacking critical depth information. In this paper we use photographs to enhance the acquired LiDAR data. Our key observation is that with an initial registration of the 2D and 3D datasets we can decompose the input photographs into rectified depth layers. We decompose the input photographs into rectangular planar fragments and diffuse depth information from the corresponding 3D scan onto the fragments by solving a multi-label assignment problem. Our layer decomposition enables accurate repetition detection in each planar layer, using which we propagate geometry, remove outliers and enhance the 3D scan. Finally, the algorithm produces an enhanced, layered, textured model. We evaluate our algorithm on complex multi-planar building facades, where direct autocorrelation methods for repetition detection fail. We demonstrate how 2D photographs help improve the 3D scans by exploiting data redundancy, and transferring high level structural information to (plausibly) complete large missing regions.
Computer Graphics Forum | 2015
Yangyan Li; Angela Dai; Leonidas J. Guibas; Matthias Nieβner
In recent years, real‐time 3D scanning technology has developed significantly and is now able to capture large environments with considerable accuracy. Unfortunately, the reconstructed geometry still suffers from incompleteness, due to occlusions and lack of view coverage, resulting in unsatisfactory reconstructions. In order to overcome these fundamental physical limitations, we present a novel reconstruction approach based on retrieving objects from a 3D shape database while scanning an environment in real‐time. With this approach, we are able to replace scanned RGB‐D data with complete, hand‐modeled objects from shape databases. We align and scale retrieved models to the input data to obtain a high‐quality virtual representation of the real‐world environment that is quite faithful to the original geometry. In contrast to previous methods, we are able to retrieve objects in cluttered and noisy scenes even when the database contains only similar models, but no exact matches. In addition, we put a strong focus on object retrieval in an interactive scanning context — our algorithm runs directly on 3D scanning data structures, and is able to query databases of thousands of models in an online fashion during scanning.
international conference on computer graphics and interactive techniques | 2013
Yangyan Li; Xiaochen Fan; Niloy J. Mitra; Daniel A. Chamovitz; Daniel Cohen-Or; Baoquan Chen
Studying growth and development of plants is of central importance in botany. Current quantitative are either limited to tedious and sparse manual measurements, or coarse image-based 2D measurements. Availability of cheap and portable 3D acquisition devices has the potential to automate this process and easily provide scientists with volumes of accurate data, at a scale much beyond the realms of existing methods. However, during their development, plants grow new parts (e.g., vegetative buds) and bifurcate to different components --- violating the central incompressibility assumption made by existing acquisition algorithms, which makes these algorithms unsuited for analyzing growth. We introduce a framework to study plant growth, particularly focusing on accurate localization and tracking topological events like budding and bifurcation. This is achieved by a novel forward-backward analysis, wherein we track robustly detected plant components back in time to ensure correct spatio-temporal event detection using a locally adapting threshold. We evaluate our approach on several groups of time lapse scans, often ranging from days to weeks, on a diverse set of plant species and use the results to animate static virtual plants or directly attach them to physical simulators.
international conference on computer graphics and interactive techniques | 2015
Matthew Fisher; Manolis Savva; Yangyan Li; Pat Hanrahan; Matthias Nießner
We present a novel method to generate 3D scenes that allow the same activities as real environments captured through noisy and incomplete 3D scans. As robust object detection and instance retrieval from low-quality depth data is challenging, our algorithm aims to model semantically-correct rather than geometrically-accurate object arrangements. Our core contribution is a new scene synthesis technique which, conditioned on a coarse geometric scene representation, models functionally similar scenes using prior knowledge learned from a scene database. The key insight underlying our scene synthesis approach is that many real-world environments are structured to facilitate specific human activities, such as sleeping or eating. We represent scene functionalities through virtual agents that associate object arrangements with the activities for which they are typically used. When modeling a scene, we first identify the activities supported by a scanned environment. We then determine semantically-plausible arrangements of virtual objects -- retrieved from a shape database -- constrained by the observed scene geometry. For a given 3D scan, our algorithm produces a variety of synthesized scenes which support the activities of the captured real environments. In a perceptual evaluation study, we demonstrate that our results are judged to be visually appealing and functionally comparable to manually designed scenes.
advances in geographic information systems | 2016
Yang Li; Yangyan Li; Dimitrios Gunopulos; Leonidas J. Guibas
Traffic trajectories collected from GPS-enabled mobile devices or vehicles are widely used in urban planning, traffic management, and location based services. Their performance often relies on having dense trajectories. However, due to the power and bandwidth limitation on these devices, collecting dense trajectory is too costly on a large scale. We show that by exploiting structural regularity in large trajectory data, the complete geometry of trajectories can be inferred from sparse GPS samples without information about the underlying road network - a process called trajectory completion. In this paper, we present a knowledge-based approach for completing traffic trajectories. Our method extracts a network of road junctions and estimates traffic flows across junctions. GPS samples within each flow cluster are then used to achieve fine-level completion of individual trajectories. Finally, we demonstrate that our method is effective for trajectory completion on both synthesized and real traffic trajectories. On average 72.7% of real trajectories with sampling rate of 60 seconds/sample are completed without map information. Comparing to map matching, over 89% of points on completed trajectories are within 15 meters from the map matched path.
Visual Informatics | 2018
Qiong Zeng; Wenzheng Chen; Zhuo Han; Mingyi Shi; Yanir Kleiman; Daniel Cohen-Or; Baoquan Chen; Yangyan Li
Abstract Understanding semantic similarity among images is the core of a wide range of computer graphics and computer vision applications. However, the visual context of images is often ambiguous as images that can be perceived with emphasis on different attributes. In this paper, we present a method for learning the semantic visual similarity among images, inferring their latent attributes and embedding them into multi-spaces corresponding to each latent attribute. We consider the multi-embedding problem as an optimization function that evaluates the embedded distances with respect to qualitative crowdsourced clusterings. The key idea of our approach is to collect and embed qualitative pairwise tuples that share the same attributes in clusters. To ensure similarity attribute sharing among multiple measures, image classification clusters are presented to, and solved by users. The collected image clusters are then converted into groups of tuples, which are fed into our group optimization algorithm that jointly infers the attribute similarity and multi-attribute embedding. Our multi-attribute embedding allows retrieving similar objects in different attribute spaces. Experimental results show that our approach outperforms state-of-the-art multi-embedding approaches on various datasets, and demonstrate the usage of the multi-attribute embedding in image retrieval application.