Angel X. Chang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Angel X. Chang is active.

Explore More

Publication

Featured researches published by Angel X. Chang.

Computational Linguistics | 2013

Deterministic coreference resolution based on entity-centric, precision-ranked rules

Heeyoung Lee; Angel X. Chang; Yves Peirsman; Nathanael Chambers; Mihai Surdeanu; Daniel Jurafsky

We propose a new deterministic approach to coreference resolution that combines the global information and precise features of modern machine-learning models with the transparency and modularity of deterministic, rule-based systems. Our sieve architecture applies a battery of deterministic coreference models one at a time from highest to lowest precision, where each model builds on the previous models cluster output. The two stages of our sieve-based architecture, a mention detection stage that heavily favors recall, followed by coreference sieves that are precision-oriented, offer a powerful way to achieve both high precision and high recall. Further, our approach makes use of global information through an entity-centric model that encourages the sharing of features across all mentions that point to the same real-world entity. Despite its simplicity, our approach gives state-of-the-art performance on several corpora and genres, and has also been incorporated into hybrid state-of-the-art coreference systems for Chinese and Arabic. Our system thus offers a new paradigm for combining knowledge in rule-based systems that has implications throughout computational linguistics.

computer vision and pattern recognition | 2017

Semantic Scene Completion from a Single Depth Image

Shuran Song; Fisher Yu; Andy Zeng; Angel X. Chang; Manolis Savva; Thomas A. Funkhouser

This paper focuses on semantic scene completion, a task for producing a complete 3D voxel representation of volumetric occupancy and semantic labels for a scene from a single-view depth map observation. Previous work has considered scene completion and semantic labeling of depth maps separately. However, we observe that these two problems are tightly intertwined. To leverage the coupled nature of these two tasks, we introduce the semantic scene completion network (SSCNet), an end-to-end 3D convolutional network that takes a single depth image as input and simultaneously outputs occupancy and semantic labels for all voxels in the camera view frustum. Our network uses a dilation-based 3D context module to efficiently expand the receptive field and enable 3D context learning. To train our network, we construct SUNCG - a manually created largescale dataset of synthetic 3D scenes with dense volumetric annotations. Our experiments demonstrate that the joint model outperforms methods addressing each task in isolation and outperforms alternative approaches on the semantic scene completion task. The dataset and code is available at http://sscnet.cs.princeton.edu.

computer vision and pattern recognition | 2017

ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes

Angela Dai; Angel X. Chang; Manolis Savva; Maciej Halber; Thomas A. Funkhouser; Matthias NieBner

A key requirement for leveraging supervised deep learning methods is the availability of large, labeled datasets. Unfortunately, in the context of RGB-D scene understanding, very little data is available – current datasets cover a small range of scene views and have limited semantic annotations. To address this issue, we introduce ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations. To collect this data, we designed an easy-to-use and scalable RGB-D capture system that includes automated surface reconstruction and crowdsourced semantic annotation. We show that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks, including 3D object classification, semantic voxel labeling, and CAD model retrieval.

empirical methods in natural language processing | 2015

Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval

Sebastian Schuster; Ranjay Krishna; Angel X. Chang; Li Fei-Fei; Christopher D. Manning

Semantically complex queries which include attributes of objects and relations between objects still pose a major challenge to image retrieval systems. Recent work in computer vision has shown that a graph-based semantic representation called a scene graph is an effective representation for very detailed image descriptions and for complex queries for retrieval. In this paper, we show that scene graphs can be effectively created automatically from a natural language scene description. We present a rule-based and a classifierbased scene graph parser whose output can be used for image retrieval. We show that including relations and attributes in the query graph outperforms a model that only considers objects and that using the output of our parsers is almost as effective as using human-constructed scene graphs (Recall@10 of 27.1% vs. 33.4%). Additionally, we demonstrate the general usefulness of parsing to scene graphs by showing that the output can also be used to generate 3D scenes.

international conference on computer graphics and interactive techniques | 2014

SceneGrok: inferring action maps in 3D environments

Manolis Savva; Angel X. Chang; Pat Hanrahan; Matthew Fisher; Matthias Nießner

With modern computer graphics, we can generate enormous amounts of 3D scene data. It is now possible to capture high-quality 3D representations of large real-world environments. Large shape and scene databases, such as the Trimble 3D Warehouse, are publicly accessible and constantly growing. Unfortunately, while a great amount of 3D content exists, most of it is detached from the semantics and functionality of the objects it represents. In this paper, we present a method to establish a correlation between the geometry and the functionality of 3D environments. Using RGB-D sensors, we capture dense 3D reconstructions of real-world scenes, and observe and track people as they interact with the environment. With these observations, we train a classifier which can transfer interaction knowledge to unobserved 3D scenes. We predict a likelihood of a given action taking place over all locations in a 3D environment and refer to this representation as an action map over the scene. We demonstrate prediction of action maps in both 3D scans and virtual scenes. We evaluate our predictions against ground truth annotations by people, and present an approach for characterizing 3D scenes by functional similarity using action maps.

empirical methods in natural language processing | 2014

Learning Spatial Knowledge for Text to 3D Scene Generation

Angel X. Chang; Manolis Savva; Christopher D. Manning

We address the grounding of natural language to concrete spatial constraints, and inference of implicit pragmatics in 3D environments. We apply our approach to the task of text-to-3D scene generation. We present a representation for common sense spatial knowledge and an approach to extract it from 3D scene data. In text-to3D scene generation, a user provides as input natural language text from which we extract explicit constraints on the objects that should appear in the scene. The main innovation of this work is to show how to augment these explicit constraints with learned spatial knowledge to infer missing objects and likely layouts for the objects in the scene. We demonstrate that spatial knowledge is useful for interpreting natural language and show examples of learned knowledge and generated 3D scenes.

international conference on computer graphics and interactive techniques | 2016

PiGraphs: learning interaction snapshots from observations

Manolis Savva; Angel X. Chang; Pat Hanrahan; Matthew Fisher; Matthias Nießner

We learn a probabilistic model connecting human poses and arrangements of object geometry from real-world observations of interactions collected with commodity RGB-D sensors. This model is encoded as a set of prototypical interaction graphs (PiGraphs), a human-centric representation capturing physical contact and visual attention linkages between 3D geometry and human body parts. We use this encoding of the joint probability distribution over pose and geometry during everyday interactions to generate interaction snapshots, which are static depictions of human poses and relevant objects during human-object interactions. We demonstrate that our model enables a novel human-centric understanding of 3D content and allows for jointly generating 3D scenes and interaction poses given terse high-level specifications, natural language, or reconstructed real-world scene constraints.

international conference on computer graphics and interactive techniques | 2014

On being the right scale: sizing large collections of 3D models

Manolis Savva; Angel X. Chang; Gilbert Louis Bernstein; Christopher D. Manning; Pat Hanrahan

We address the problem of recovering reliable sizes for a collection of models defined using scales with unknown correspondence to physical units. Our algorithmic approach provides absolute size estimates for 3D models by combining category-based size priors and size observations from 3D scenes. Our approach handles un-observed 3D models without any user intervention. It also scales to large public 3D model databases and is appropriate for handling the open-world problem of rapidly expanding collections of 3D models. We use two datasets from online 3D model repositories to evaluate against both human judgments of size and ground truth physical sizes of 3D models, and find that an algorithmic approach can predict sizes more accurately than people.

computer vision and pattern recognition | 2015

Semantically-enriched 3D models for common-sense knowledge

Manolis Savva; Angel X. Chang; Pat Hanrahan

We identify and connect a set of physical properties to 3D models to create a richly-annotated 3D model dataset with data on physical sizes, static support, attachment surfaces, material compositions, and weights. To collect these physical property priors, we leverage observations of 3D models within 3D scenes and information from images and text. By augmenting 3D models with these properties we create a semantically rich, multi-layered dataset of common indoor objects. We demonstrate the usefulness of these annotations for improving 3D scene synthesis systems, enabling faceted semantic queries into 3D model datasets, and reasoning about how objects can be manipulated by people using weight and static friction estimates.

meeting of the association for computational linguistics | 2014

Semantic Parsing for Text to 3D Scene Generation

Angel X. Chang; Manolis Savva; Christopher D. Manning

We propose text-to-scene generation as an application for semantic parsing. This is an application that grounds semantics in a virtual world that requires understanding of common, everyday language. In text to scene generation, the user provides a textual description and the system generates a 3D scene. For example, Figure 1 shows the generated scene for the input text “there is a room with a chair and a computer”. This is a challenging, open-ended problem that prior work has only addressed in a limited way. Most of the technical challenges in text to scene generation stem from the difficulty of mapping language to formal representations of visual scenes, as well as an overall absence of real world spatial knowledge from current NLP systems. These issues are partly due to the omission in natural language of many facts about the world. When people describe scenes in text, they typically specify only important, relevant information. Many common sense facts are unstated (e.g., chairs and desks are typically on the floor). Therefore, we focus on inferring implicit relations that are likely to hold even if they are not explicitly stated by the input text. Text to scene generation offers a rich, interactive environment for grounded language that is familiar to everyone. The entities are common, everyday objects, and the knowledge necessary to address this problem is of general use across many domains. We present a system that leverages user interactionwith 3D scenes to generate training data for semantic parsing approaches. Previous semantic parsing work has dealt with grounding text to physical attributes and relations (Matuszek et al., 2012; Krishnamurthy and Kollar, 2013), generating text for referring to objects (FitzGerald et al., 2013) and with connecting language to spatial relationships (Golland et al., 2010; Artzi and Zettlemoyer, 2013). Semantic parsing methods can also be applied to many aspects of text to scene generation. Furthermore, work on parsing instructions to robots (Matuszek et al., 2013; Tellex et al., 2014) has analogues in the context of discourse about physical scenes. In this extended abstract, we formalize the text to scene generation problem and describe it as a task for semantic parsing methods. To motivate this problem, we present a prototype system that incorporates simple spatial knowledge, and parses natural text to a semantic representation. By learning priors on spatial knowledge (e.g., typical positions of objects, and common spatial relations) our system addresses inference of implicit spatial constraints. The user can interactively manipulate the generated scene with textual commands, enabling us to refine and expand learned priors. Our current system uses deterministic rules to map text to a scene representation but we plan to explore training a semantic parser from data. We can leverage our system to collect user interactions for training data. Crowdsourcing is a promising avenue for obtaining a large scale dataset.

Explore More