Is this you? Create Your Porfile

Dahua Lin

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dahua Lin is active.

Explore More

Publication

Featured researches published by Dahua Lin.

international conference on computer vision | 2013

Holistic Scene Understanding for 3D Object Detection with RGBD Cameras

Dahua Lin; Sanja Fidler; Raquel Urtasun

In this paper, we tackle the problem of indoor scene understanding using RGBD data. Towards this goal, we propose a holistic approach that exploits 2D segmentation, 3D geometry, as well as contextual relations between scenes and objects. Specifically, we extend the CPMC [3] framework to 3D in order to generate candidate cuboids, and develop a conditional random field to integrate information from different sources to classify the cuboids. With this formulation, scene classification and 3D object recognition are coupled and can be jointly solved through probabilistic inference. We test the effectiveness of our approach on the challenging NYU v2 dataset. The experimental results demonstrate that through effective evidence integration and holistic reasoning, our approach achieves substantial improvement over the state-of-the-art.

computer vision and pattern recognition | 2014

What Are You Talking About? Text-to-Image Coreference

Chen Kong; Dahua Lin; Mohit Bansal; Raquel Urtasun; Sanja Fidler

In this paper we exploit natural sentential descriptions of RGB-D scenes in order to improve 3D semantic parsing. Importantly, in doing so, we reason about which particular object each noun/pronoun is referring to in the image. This allows us to utilize visual information in order to disambiguate the so-called coreference resolution problem that arises in text. Towards this goal, we propose a structure prediction model that exploits potentials computed from text and RGB-D imagery to reason about the class of the 3D objects, the scene type, as well as to align the nouns/pronouns with the referred visual objects. We demonstrate the effectiveness of our approach on the challenging NYU-RGBD v2 dataset, which we enrich with natural lingual descriptions. We show that our approach significantly improves 3D detection and scene classification accuracy, and is able to reliably estimate the text-to-image alignment. Furthermore, by using textual and visual information, we are also able to successfully deal with coreference in text, improving upon the state-of-the-art Stanford coreference system [15].

computer vision and pattern recognition | 2014

Visual Semantic Search: Retrieving Videos via Complex Textual Queries

Dahua Lin; Sanja Fidler; Chen Kong; Raquel Urtasun

In this paper, we tackle the problem of retrieving videos using complex natural language queries. Towards this goal, we first parse the sentential descriptions into a semantic graph, which is then matched to visual concepts using a generalized bipartite matching algorithm. Our approach exploits object appearance, motion and spatial relations, and learns the importance of each term using structure prediction. We demonstrate the effectiveness of our approach on a new dataset designed for semantic search in the context of autonomous driving, which exhibits complex and highly dynamic scenes with many objects. We show that our approach is able to locate a major portion of the objects described in the query with high accuracy, and improve the relevance in video retrieval.

computer vision and pattern recognition | 2009

Learning visual flows: A Lie algebraic approach

Dahua Lin; W. Eric L. Grimson; John W. Fisher

We present a novel method for modeling dynamic visual phenomena, which consists of two key aspects. First, the integral motion of constituent elements in a dynamic scene is captured by a common underlying geometric transform process. Second, a Lie algebraic representation of the transform process is introduced, which maps the transformation group to a vector space, and thus overcomes the difficulties due to the group structure. Consequently, the statistical learning techniques based on vector spaces can be readily applied. Moreover, we discuss the intrinsic connections between the Lie algebra and the Linear dynamical processes, showing that our model induces spatially varying fields that can be estimated from local motions without continuous tracking. Following this, we further develop a statistical framework to robustly learn the flow models from noisy and partially corrupted observations. The proposed methodology is demonstrated on real world phenomenon, inferring common motion patterns from surveillance videos of crowded scenes and satellite data of weather evolution.

computer vision and pattern recognition | 2017

PolyNet: A Pursuit of Structural Diversity in Very Deep Networks

Xingcheng Zhang; Zhizhong Li; Chen Change Loy; Dahua Lin

A number of studies have shown that increasing the depth or width of convolutional networks is a rewarding approach to improve the performance of image recognition. In our study, however, we observed difficulties along both directions. On one hand, the pursuit for very deep networks is met with a diminishing return and increased training difficulty, on the other hand, widening a network would result in a quadratic growth in both computational cost and memory demand. These difficulties motivate us to explore structural diversity in designing deep networks, a new dimension beyond just depth and width. Specifically, we present a new family of modules, namely the PolyInception, which can be flexibly inserted in isolation or in a composition as replacements of different parts of a network. Choosing PolyInception modules with the guidance of architectural efficiency can improve the expressive power while preserving comparable computational cost. The Very Deep PolyNet, designed following this direction, demonstrates substantial improvements over the state-of-the-art on the ILSVRC 2012 benchmark. Compared to Inception-ResNet-v2, it reduces the top-5 validation error on single crops from 4.9% to 4.25%, and that on multi-crops from 3.7% to 3.45%.

international conference on computer vision | 2013

Characterizing Layouts of Outdoor Scenes Using Spatial Topic Processes

Dahua Lin; Jianxiong Xiao

In this paper, we develop a generative model to describe the layouts of outdoor scenes - the spatial configuration of regions. Specifically, the layout of an image is represented as a composite of regions, each associated with a semantic topic. At the heart of this model is a novel stochastic process called Spatial Topic Process, which generates a spatial map of topics from a set of coupled Gaussian processes, thus allowing the distributions of topics to vary continuously across the image plane. A key aspect that distinguishes this model from previous ones consists in its capability of capturing dependencies across both locations and topics while allowing substantial variations in the layouts. We demonstrate the practical utility of the proposed model by testing it on scene classification, semantic segmentation, and layout hallucination.

computer vision and pattern recognition | 2012

Manifold guided composite of Markov random fields for image modeling

Dahua Lin; John W. Fisher

We present a new generative image model, integrating techniques arising from two different domains: manifold modeling and Markov random fields. First, we develop a probabilistic model with a mixture of hyperplanes to approximate the manifold of orientable image patches, and demonstrate that it is more effective than the field of experts in expressing local texture patterns. Next, we develop a construction that yields an MRF for coherent image generation, given a configuration of local patch models, and thereby establish a prior distribution over an MRF space. Taking advantage of the model structure, we derive a variational inference algorithm, and apply it to low-level vision. In contrast to previous methods that rely on a single MRF, the method infers an approximate posterior distribution of MRFs, and recovers the underlying images by combining the predictions in a Bayesian fashion. Experiments quantitatively demonstrate superior performance as compared to state-of-the-art methods on image denoising and inpainting.

computer vision and pattern recognition | 2017

Discover and Learn New Objects from Documentaries

Kai Chen; Hang Song; Chen Change Loy; Dahua Lin

Despite the remarkable progress in recent years, detecting objects in a new context remains a challenging task. Detectors learned from a public dataset can only work with a fixed list of categories, while training from scratch usually requires a large amount of training data with detailed annotations. This work aims to explore a novel approach – learning object detectors from documentary films in a weakly supervised manner. This is inspired by the observation that documentaries often provide dedicated exposition of certain object categories, where visual presentations are aligned with subtitles. We believe that object detectors can be learned from such a rich source of information. Towards this goal, we develop a joint probabilistic framework, where individual pieces of information, including video frames and subtitles, are brought together via both visual and linguistic links. On top of this formulation, we further derive a weakly supervised learning algorithm, where object model learning and training set mining are unified in an optimization procedure. Experimental results on a real world dataset demonstrate that this is an effective approach to learning new object detectors.

european conference on computer vision | 2012

Learning Deformations with Parallel Transport

Donglai Wei; Dahua Lin; John W. Fisher

Many vision problems, such as object recognition and image synthesis, are greatly impacted by deformation of objects. In this paper, we develop a deformation model based on Lie algebraic analysis. This work aims to provide a generative model that explicitly decouples deformation from appearance, which is fundamentally different from the prior work that focuses on deformation-resilient features or metrics. Specifically, the deformation group for each object can be characterized by a set of Lie algebraic basis. Such basis for different objects are related via parallel transport. Exploiting the parallel transport relations, we formulate an optimization problem, and derive an algorithm that jointly estimates the deformation basis for a class of objects, given a set of images resulted from the action of the deformations. We test the proposed model empirically on both character recognition and face synthesis.

computer vision and pattern recognition | 2012

Low level vision via switchable Markov random fields

Dahua Lin; John W. Fisher

Markov random fields play a central role in solving a variety of low level vision problems, including denoising, in-painting, segmentation, and motion estimation. Much previous work was based on MRFs with hand-crafted networks, yet the underlying graphical structure is rarely explored. In this paper, we show that if appropriately estimated, the MRFs graphical structure, which captures significant information about appearance and motion, can provide crucial guidance to low level vision tasks. Motivated by this observation, we propose a principled framework to solve low level vision tasks via an exponential family of MRFs with variable structures, which we call Switchable MRFs. The approach explicitly seeks a structure that optimally adapts to the image or video along the pursuit of task-specific goals. Through theoretical analysis and experimental study, we demonstrate that the proposed method addresses a number of drawbacks suffered by previous methods, including failure to capture heavy-tail statistics, computational difficulties, and lack of generality.

Explore More