Is this you? Create Your Porfile

Yixin Zhu

University of California, Los Angeles

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yixin Zhu is active.

Explore More

Publication

Featured researches published by Yixin Zhu.

computer vision and pattern recognition | 2015

Understanding tools: Task-oriented object modeling, learning and recognition

Yixin Zhu; Yibiao Zhao; Song-Chun Zhu

In this paper, we present a new framework - task-oriented modeling, learning and recognition which aims at understanding the underlying functions, physics and causality in using objects as “tools”. Given a task, such as, cracking a nut or painting a wall, we represent each object, e.g. a hammer or brush, in a generative spatio-temporal representation consisting of four components: i) an affordances basis to be grasped by hand; ii) a functional basis to act on a target object (the nut), iii) the imagined actions with typical motion trajectories; and iv) the underlying physical concepts, e.g. force, pressure, etc. In a learning phase, our algorithm observes only one RGB-D video, in which a rational human picks up one object (i.e. tool) among a number of candidates to accomplish the task. From this example, our algorithm learns the essential physical concepts in the task (e.g. forces in cracking nuts). In an inference phase, our algorithm is given a new set of objects (daily objects or stones), and picks the best choice available together with the inferred affordance basis, functional basis, imagined human actions (sequence of poses), and the expected physical quantity that it will produce. From this new perspective, any objects can be viewed as a hammer or a shovel, and object recognition is not merely memorizing typical appearance examples for each category but reasoning the physical mechanisms in various tasks to achieve generalization.

computer vision and pattern recognition | 2016

Inferring Forces and Learning Human Utilities from Videos

Yixin Zhu; Chenfanfu Jiang; Yibiao Zhao; Demetri Terzopoulos; Song-Chun Zhu

We propose a notion of affordance that takes into account physical quantities generated when the human body interacts with real-world objects, and introduce a learning framework that incorporates the concept of human utilities, which in our opinion provides a deeper and finer-grained account not only of object affordance but also of peoples interaction with objects. Rather than defining affordance in terms of the geometric compatibility between body poses and 3D objects, we devise algorithms that employ physicsbased simulation to infer the relevant forces/pressures acting on body parts. By observing the choices people make in videos (particularly in selecting a chair in which to sit) our system learns the comfort intervals of the forces exerted on body parts (while sitting). We account for peoples preferences in terms of human utilities, which transcend comfort intervals to account also for meaningful tasks within scenes and spatiotemporal constraints in motion planning, such as for the purposes of robot task planning.

international conference on computer graphics and interactive techniques | 2016

A virtual reality platform for dynamic human-scene interaction

Jenny Lin; Xingwen Guo; Jingyu Shao; Chenfanfu Jiang; Yixin Zhu; Song-Chun Zhu

Both synthetic static and simulated dynamic 3D scene data is highly useful in the fields of computer vision and robot task planning. Yet their virtual nature makes it difficult for real agents to interact with such data in an intuitive way. Thus currently available datasets are either static or greatly simplified in terms of interactions and dynamics. In this paper, we propose a system in which Virtual Reality and human / finger pose tracking is integrated to allow agents to interact with virtual environments in real time. Segmented object and scene data is used to construct a scene within Unreal Engine 4, a physics-based game engine. We then use the Oculus Rift headset with a Kinect sensor, Leap Motion controller and a dance pad to navigate and manipulate objects inside synthetic scenes in real time. We demonstrate how our system can be used to construct a multi-jointed agent representation as well as fine-grained finger pose. In the end, we propose how our system can be used for robot task planning and image semantic segmentation.

asian conference on computer vision | 2016

Classification of Lung Nodule Malignancy Risk on Computed Tomography Images Using Convolutional Neural Network: A Comparison Between 2D and 3D Strategies

Xingjian Yan; Jianing Pang; Hang Qi; Yixin Zhu; Chunxue Bai; Xin Geng; Mina Liu; Demetri Terzopoulos; Xiaowei Ding

Computed tomography (CT) is the preferred method for non-invasive lung cancer screening. Early detection of potentially malignant lung nodules will greatly improve patient outcome, where an effective computer-aided diagnosis (CAD) system may play an important role. Two-dimensional convolutional neural network (CNN) based CAD methods have been proposed and well-studied to extract hierarchical and discriminative features for classifying lung nodules. It is often questioned if the transition to 3D will be a key to major step forward in performance. In this paper, we propose a novel 3D CNN on the 1018-patient Lung Image Database Consortium collection (LIDC-IDRI). To the best of our knowledge, this is the first time to directly compare three different strategies: slice-level 2D CNN, nodule-level 2D CNN and nodule-level 3D CNN. Using comparable network architectures, we achieved nodule malignancy risk classification accuracies of \(86.7\%\), \(87.3\%\) and \(87.4\%\) against the personal opinion of four radiologists, respectively. In the experiments, our results and analyses demonstrates that the nodule-level 2D CNN can better capture the z-direction features of lung nodule than a slice-level 2D approach, whereas nodule-level 3D CNN can further integrate nodule-level features as well as context features from all three directions in a 3D patch in a limited extent, resulting in a slightly better performance than the other two strategies.

international conference on computer graphics and interactive techniques | 2018

A Moving Least Squares Material Point Method with Displacement Discontinuity and Two-Way Rigid Body Coupling

Yuanming Hu; Yu Fang; Ziheng Ge; Ziyin Qu; Yixin Zhu; Andre Pradhana; Chenfanfu Jiang

In this paper, we introduce the Moving Least Squares Material Point Method (MLS-MPM). MLS-MPM naturally leads to the formulation of Affine Particle-In-Cell (APIC) [Jiang et al. 2015] and Polynomial Particle-In-Cell [Fu et al. 2017] in a way that is consistent with a Galerkin-style weak form discretization of the governing equations. Additionally, it enables a new stress divergence discretization that effortlessly allows all MPM simulations to run two times faster than before. We also develop a Compatible Particle-In-Cell (CPIC) algorithm on top of MLS-MPM. Utilizing a colored distance field representation and a novel compatibility condition for particles and grid nodes, our framework enables the simulation of various new phenomena that are not previously supported by MPM, including material cutting, dynamic open boundaries, and two-way coupling with rigid bodies. MLS-MPM with CPIC is easy to implement and friendly to performance optimization.

International Journal of Computer Vision | 2018

Configurable 3D Scene Synthesis and 2D Image Rendering with Per-pixel Ground Truth Using Stochastic Grammars

Chenfanfu Jiang; Siyuan Qi; Yixin Zhu; Siyuan Huang; Jenny Lin; Lap-Fai Yu; Demetri Terzopoulos; Song-Chun Zhu

We propose a systematic learning-based approach to the generation of massive quantities of synthetic 3D scenes and arbitrary numbers of photorealistic 2D images thereof, with associated ground truth information, for the purposes of training, benchmarking, and diagnosing learning-based computer vision and robotics algorithms. In particular, we devise a learning-based pipeline of algorithms capable of automatically generating and rendering a potentially infinite variety of indoor scenes by using a stochastic grammar, represented as an attributed Spatial And-Or Graph, in conjunction with state-of-the-art physics-based rendering. Our pipeline is capable of synthesizing scene layouts with high diversity, and it is configurable inasmuch as it enables the precise customization and control of important attributes of the generated scenes. It renders photorealistic RGB images of the generated scenes while automatically synthesizing detailed, per-pixel ground truth data, including visible surface depth and normal, object identity, and material information (detailed to object parts), as well as environments (e.g., illuminations and camera viewpoints). We demonstrate the value of our synthesized dataset, by improving performance in certain machine-learning-based scene understanding tasks—depth and surface normal prediction, semantic segmentation, reconstruction, etc.—and by providing benchmarks for and diagnostics of trained models by modifying object attributes and scene properties in a controllable manner.

international conference on computer graphics and interactive techniques | 2016

Evaluating physical quantities and learning human utilities from RGBD videos

Yixin Zhu; Chenfanfu Jiang; Yibiao Zhao; Demetri Terzopoulos; Song-Chun Zhu

We propose a notion of affordance that takes into account physical quantities generated when the human body interacts with real-world objects, and introduce a learning framework that incorporates the concept of human utilities, which in our opinion provides a deeper and finer-grained account not only of object affordance but also of peoples interaction with objects. Rather than defining affordance in terms of the geometric compatibility between body poses and 3D objects, we devise algorithms that employ physics-based simulation to infer the relevant forces/pressures acting on body parts. By observing the choices people make in videos (particularly in selecting a chair in which to sit) our system learns the comfort intervals of the forces exerted on body parts (while sitting). We account for peoples preferences in terms of human utilities, which transcend comfort intervals to account also for meaningful tasks within scenes and spatiotemporal constraints in motion planning, such as for the purposes of robot task planning.

Cognitive Science | 2015