Jianxiong Xiao
Hong Kong University of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jianxiong Xiao.
international conference on computer vision | 2013
Jianxiong Xiao; Andrew Owens; Antonio Torralba
Existing scene understanding datasets contain only a limited set of views of a place, and they lack representations of complete 3D spaces. In this paper, we introduce SUN3D, a large-scale RGB-D video database with camera pose and object labels, capturing the full 3D extent of many places. The tasks that go into constructing such a dataset are difficult in isolation -- hand-labeling videos is painstaking, and structure from motion (SfM) is unreliable for large spaces. But if we combine them together, we make the dataset construction task much easier. First, we introduce an intuitive labeling tool that uses a partial reconstruction to propagate labels from one frame to another. Then we use the object labels to fix errors in the reconstruction. For this, we introduce a generalization of bundle adjustment that incorporates object-to-object correspondences. This algorithm works by constraining points for the same object from different frames to lie inside a fixed-size bounding box, parameterized by its rotation and translation. The SUN3D database, the source code for the generalized bundle adjustment, and the web-based 3D annotation tool are all available at http://sun3d.cs.princeton.edu.
international conference on computer graphics and interactive techniques | 2008
Jianxiong Xiao; Tian Fang; Ping Tan; Peng Zhao; Eyal Ofek; Long Quan
We propose in this paper a semi-automatic image-based approach to facade modeling that uses images captured along streets and relies on structure from motion to recover camera positions and point clouds automatically as the initial stage for modeling. We start by considering a building facade as a flat rectangular plane or a developable surface with an associated texture image composited from the multiple visible images. A facade is then decomposed and structured into a Directed Acyclic Graph of rectilinear elementary patches. The decomposition is carried out top-down by a recursive subdivision, and followed by a bottom-up merging with the detection of the architectural bilateral symmetry and repetitive patterns. Each subdivided patch of the flat facade is augmented with a depth optimized using the 3D points cloud. Our system also allows for an easy user feedback in the 2D image space for the proposed decomposition and augmentation. Finally, our approach is demonstrated on a large number of facades from a variety of street-side images.
international conference on computer vision | 2009
Jianxiong Xiao; Long Quan
We propose a simple but powerful multi-view semantic segmentation framework for images captured by a camera mounted on a car driving along streets. In our approach, a pair-wise Markov Random Field (MRF) is laid out across multiple views. Both 2D and 3D features are extracted at a super-pixel level to train classifiers for the unary data terms of MRF. For smoothness terms, our approach makes use of color differences in the same image to identify accurate segmentation boundaries, and dense pixel-to-pixel correspondences to enforce consistency across different views. To speed up training and to improve the recognition quality, our approach adaptively selects the most similar training data for each scene from the label pool. Furthermore, we also propose a powerful approach within the same framework to enable large-scale labeling in both the 3D space and 2D images. We demonstrate our approach on more than 10,000 images from Google Maps Street View.
international conference on computer graphics and interactive techniques | 2008
Ping Tan; Tian Fang; Jianxiong Xiao; Peng Zhao; Long Quan
In this paper, we introduce a simple sketching method to generate a realistic 3D tree model from a single image. The user draws at least two strokes in the tree image: the first crown stroke around the tree crown to mark up the leaf region, the second branch stroke from the tree root to mark up the main trunk, and possibly few other branch strokes for refinement. The method automatically generates a 3D tree model including branches and leaves. Branches are synthesized by a growth engine from a small library of elementary subtrees that are pre-defined or built on the fly from the recovered visible branches. The visible branches are automatically traced from the drawn branch strokes according to image statistics on the strokes. Leaves are generated from the region bounded by the first crown stroke to complete the tree. We demonstrate our method on a variety of examples.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2014
Phillip Isola; Jianxiong Xiao; Devi Parikh; Antonio Torralba; Aude Oliva
When glancing at a magazine, or browsing the Internet, we are continuously exposed to photographs. Despite this overflow of visual information, humans are extremely good at remembering thousands of pictures along with some of their visual details. But not all images are equal in memory. Some stick in our minds while others are quickly forgotten. In this paper, we focus on the problem of predicting how memorable an image will be. We show that memorability is an intrinsic and stable property of an image that is shared across different viewers, and remains stable across delays. We introduce a database for which we have measured the probability that each picture will be recognized after a single view. We analyze a collection of image features, labels, and attributes that contribute to making an image memorable, and we train a predictor based on global image descriptors. We find that predicting image memorability is a task that can be addressed with current computer vision techniques. While making memorable images is a challenging task in visualization, photography, and education, this work is a first attempt to quantify this useful property of images.
international conference on computer vision | 2007
Jianxiong Xiao; Jingdong Wang; Ping Tan; Long Quan
A joint segmentation is a simultaneous segmentation of registered 2D images and 3D points reconstructed from the multiple view images. It is fundamental in structuring the data for subsequent modeling applications. In this paper, we treat this joint segmentation as a weighted graph labeling problem. First, we construct a 3D graph for the joint 3D and 2D points using a joint similarity measure. Then, we propose a hierarchical sparse affinity propagation algorithm to automatically and jointly segment 2D images and group 3D points. Third, a semi-supervised affinity propagation algorithm is proposed to refine the automatic results with the user assistance. Finally, intensive experiments demonstrate the effectiveness of the proposed approaches.
european conference on computer vision | 2010
Honghui Zhang; Jianxiong Xiao; Long Quan
In this paper, we propose a robust supervised label transfer method for the semantic segmentation of street scenes. Given an input image of street scene, we first find multiple image sets from the training database consisting of images with annotation, each of which can cover all semantic categories in the input image. Then, we establish dense correspondence between the input image and each found image sets with a proposed KNN-MRF matching scheme. It is followed by a matching correspondences classification that tries to reduce the number of semantically incorrect correspondences with trained matching correspondences classification models for different categories. With those matching correspondences classified as semantically correct correspondences, we infer the confidence values of each super pixel belonging to different semantic categories, and integrate them and spatial smoothness constraint in a markov random field to segment the input image. Experiments on three datasets show our method outperforms the traditional learning based methods and the previous nonparametric label transfer method, for the semantic segmentation of street scenes.
european conference on computer vision | 2008
Jianxiong Xiao; Jingni Chen; Dit Yan Yeung; Long Quan
We propose a novel and efficient method for generic arbitrary-view object class detection and localization. In contrast to existing single-view and multi-view methods using complicated mechanisms for relating the structural information in different parts of the objects or different viewpoints, we aim at representing the structural information in their true 3D locations. Uncalibrated multi-view images from a hand-held camera are used to reconstruct the 3D visual word models in the training stage. In the testing stage, beyond bounding boxes, our method can automatically determine the locations and outlines of multiple objects in the test image with occlusion handling, and can accurately estimate both the intrinsic and extrinsic camera parameters in an optimized way. With exemplar models, our method can also handle shape deformation for intra-class variance. To handle large data sets from models, we propose several speedup techniques to make the prediction efficient. Experimental results obtained based on some standard data sets demonstrate the effectiveness of the proposed approach.
computer vision and pattern recognition | 2013
Yinda Zhang; Jianxiong Xiao; James Hays; Ping Tan
We significantly extrapolate the field of view of a photograph by learning from a roughly aligned, wide-angle guide image of the same scene category. Our method can extrapolate typical photos into complete panoramas. The extrapolation problem is formulated in the shift-map image synthesis framework. We analyze the self-similarity of the guide image to generate a set of allowable local transformations and apply them to the input image. Our guided shift-map method reserves to the scene layout of the guide image when extrapolating a photograph. While conventional shift-map methods only support translations, this is not expressive enough to characterize the self-similarity of complex scenes. Therefore we additionally allow image transformations of rotation, scaling and reflection. To handle this increase in complexity, we introduce a hierarchical graph optimization method to choose the optimal transformation at each output pixel. We demonstrate our approach on a variety of indoor, outdoor, natural, and man-made scenes.
international conference on computer graphics and interactive techniques | 2012
Aditya Khosla; Jianxiong Xiao; Phillip Isola; Antonio Torralba; Aude Oliva
When glancing at a magazine, or browsing the Internet, we are continuously being exposed to photographs. However, not all images are equal in memory; some stitch to our minds, while others are forgotten. In this paper we discuss the notion of image memorability and the elements that make it memorable. Our recent works have shown that image memorability is a stable and intrinsic property of images that is shared across different viewers. Given that this is the case, we discuss the possibility of modifying the memorability of images by identifying the memorability of image regions. Further, we introduce and provide evidence for the phenomenon of visual inception: can we make people believe they have seen an image they have not?