De-An Huang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where De-An Huang is active.

Explore More

Publication

Featured researches published by De-An Huang.

international conference on computer vision | 2013

Coupled Dictionary and Feature Space Learning with Applications to Cross-Domain Image Synthesis and Recognition

De-An Huang; Yu-Chiang Frank Wang

Cross-domain image synthesis and recognition are typically considered as two distinct tasks in the areas of computer vision and pattern recognition. Therefore, it is not clear whether approaches addressing one task can be easily generalized or extended for solving the other. In this paper, we propose a unified model for coupled dictionary and feature space learning. The proposed learning model not only observes a common feature space for associating cross-domain image data for recognition purposes, the derived feature space is able to jointly update the dictionaries in each image domain for improved representation. This is why our method can be applied to both cross-domain image synthesis and recognition problems. Experiments on a variety of synthesis and recognition tasks such as single image super-resolution, cross-view action recognition, and sketch-to-photo face recognition would verify the effectiveness of our proposed learning model.

IEEE Transactions on Multimedia | 2014

Self-Learning Based Image Decomposition With Applications to Single Image Denoising

De-An Huang; Li-Wei Kang; Yu-Chiang Frank Wang; Chia-Wen Lin

Decomposition of an image into multiple semantic components has been an effective research topic for various image processing applications such as image denoising, enhancement, and inpainting. In this paper, we present a novel self-learning based image decomposition framework. Based on the recent success of sparse representation, the proposed framework first learns an over-complete dictionary from the high spatial frequency parts of the input image for reconstruction purposes. We perform unsupervised clustering on the observed dictionary atoms (and their corresponding reconstructed image versions) via affinity propagation, which allows us to identify image-dependent components with similar context information. While applying the proposed method for the applications of image denoising, we are able to automatically determine the undesirable patterns (e.g., rain streaks or Gaussian noise) from the derived image components directly from the input image, so that the task of single-image denoising can be addressed. Different from prior image processing works with sparse representation, our method does not need to collect training image data in advance, nor do we assume image priors such as the relationship between input and output image dictionaries. We conduct experiments on two denoising problems: single-image denoising with Gaussian noise and rain removal. Our empirical results confirm the effectiveness and robustness of our approach, which is shown to outperform state-of-the-art image denoising algorithms.

european conference on computer vision | 2014

Action-Reaction: Forecasting the Dynamics of Human Interaction

De-An Huang; Kris M. Kitani

Forecasting human activities from visual evidence is an emerging area of research which aims to allow computational systems to make predictions about unseen human actions. We explore the task of activity forecasting in the context of dual-agent interactions to understand how the actions of one person can be used to predict the actions of another. We model dual-agent interactions as an optimal control problem, where the actions of the initiating agent induce a cost topology over the space of reactive poses – a space in which the reactive agent plans an optimal pose trajectory. The technique developed in this work employs a kernel-based reinforcement learning approximation of the soft maximum value function to deal with the high-dimensional nature of human motion and applies a mean-shift procedure over a continuous cost function to infer a smooth reaction sequence. Experimental results show that our proposed method is able to properly model human interactions in a high dimensional space of human poses. When compared to several baseline models, results show that our method is able to generate highly plausible simulations of human interaction.

european conference on computer vision | 2016

Connectionist Temporal Modeling for Weakly Supervised Action Labeling

De-An Huang; Li Fei-Fei; Juan Carlos Niebles

We propose a weakly-supervised framework for action labeling in video, where only the order of occurring actions is required during training time. The key challenge is that the per-frame alignments between the input (video) and label (action) sequences are unknown during training. We address this by introducing the Extended Connectionist Temporal Classification (ECTC) framework to efficiently evaluate all possible alignments via dynamic programming and explicitly enforce their consistency with frame-to-frame visual similarities. This protects the model from distractions of visually inconsistent or degenerated alignments without the need of temporal supervision. We further extend our framework to the semi-supervised case when a few frames are sparsely annotated in a video. With less than 1 % of labeled frames per video, our method is able to outperform existing semi-supervised approaches and achieve comparable performance to that of fully supervised approaches.

computer vision and pattern recognition | 2017

Unsupervised Learning of Long-Term Motion Dynamics for Videos

Zelun Luo; Boya Peng; De-An Huang; Alexandre Alahi; Li Fei-Fei

We present an unsupervised representation learning approach that compactly encodes the motion dependencies in videos. Given a pair of images from a video clip, our framework learns to predict the long-term 3D motions. To reduce the complexity of the learning framework, we propose to describe the motion as a sequence of atomic 3D flows computed with RGB-D modality. We use a Recurrent Neural Network based Encoder-Decoder framework to predict these sequences of flows. We argue that in order for the decoder to reconstruct these sequences, the encoder must learn a robust video representation that captures long-term motion dependencies and spatial-temporal relations. We demonstrate the effectiveness of our learned temporal representations on activity classification across multiple modalities and datasets such as NTU RGB+D and MSR Daily Activity 3D. Our framework is generic to any input modality, i.e., RGB, depth, and RGB-D videos.

international conference on multimedia and expo | 2012

Context-Aware Single Image Rain Removal

De-An Huang; Li-Wei Kang; Min-Chun Yang; Chia-Wen Lin; Yu-Chiang Frank Wang

Rain removal from a single image is one of the challenging image denoising problems. In this paper, we present a learning-based framework for single image rain removal, which focuses on the learning of context information from an input image, and thus the rain patterns present in it can be automatically identified and removed. We approach the single image rain removal problem as the integration of image decomposition and self-learning processes. More precisely, our method first performs context-constrained image segmentation on the input image, and we learn dictionaries for the high-frequency components in different context categories via sparse coding for reconstruction purposes. For image regions with rain streaks, dictionaries of distinct context categories will share common atoms which correspond to the rain patterns. By utilizing PCA and SVM classifiers on the learned dictionaries, our framework aims at automatically identifying the common rain patterns present in them, and thus we can remove rain streaks as particular high-frequency components from the input image. Different from prior works on rain removal from images/videos which require image priors or training image data from multiple frames, our proposed self-learning approach only requires the input image itself, which would save much pre-training effort. Experimental results demonstrate the subjective and objective visual quality improvement with our proposed method.

computer vision and pattern recognition | 2015

How do we use our hands? Discovering a diverse set of common grasps

De-An Huang; Minghuang Ma; Wei-Chiu Ma; Kris M. Kitani

Our aim is to show how state-of-the-art computer vision techniques can be used to advance prehensile analysis (i.e., understanding the functionality of human hands). Prehensile analysis is a broad field of multi-disciplinary interest, where researchers painstakingly manually analyze hours of hand-object interaction videos to understand the mechanics of hand manipulation. In this work, we present promising empirical results indicating that wearable cameras and unsupervised clustering techniques can be used to automatically discover common modes of human hand use. In particular, we use a first-person point-of-view camera to record common manipulation tasks and leverage its strengths for reliably observing human hand use. To learn a diverse set of hand-object interactions, we propose a fast online clustering algorithm based on the Determinantal Point Process (DPP). Furthermore, we develop a hierarchical extension to the DPP clustering algorithm and show that it can be used to discover appearance-based grasp taxonomies. Using a purely data-driven approach, our proposed algorithm is able to obtain hand grasp taxonomies that roughly correspond to the classic Cutkosky grasp taxonomy. We validate our approach on over 10 hours of first-person point-of-view videos in both choreographed and real-life scenarios.

visual communications and image processing | 2012

Context-aware single image super-resolution using locality-constrained group sparse representation

Chih-Yun Tsai; De-An Huang; Min-Chun Yang; Li-Wei Kang; Yu-Chiang Frank Wang

We present a novel learning-based method for single image super-resolution (SR). Given a single input low-resolution (LR) image (and its image pyramid), we propose to learn context-specific image sparse representation, which aims at modeling the relationship between low and high-resolution image patch pairs of different context categories in terms of the learned dictionaries. To predict the SR image, we derive the context-specific sparse representation of each image patch in the LR input with additional locality and group sparsity constraints. While the locality constraint searches for the most similar image patches and uses the corresponding highresolution outputs for SR, the group sparsity constraint allows us to utilize the information from most relevant context categories for predicting the final SR output. Experimental results show the proposed method is able to quantitatively and qualitatively achieve state-of-the-art performance.

international conference on computer aided design | 2012

Compiling program control flows into biochemical reactions

De-An Huang; Jie-Hong R. Jiang; Ruei-Yang Huang; Chi-Yun Cheng

Computing with biochemical reactions emerges in synthetic biology. With high-level programming languages, a target computation can be intuitively and effectively specified. As control flows form the skeleton of most programs, how to translate them into biochemical reactions is crucial but remains ad hoc. This paper shows a systematic approach to transforming control flows into robust molecular reactions. Case studies demonstrate its usefulness.

acm multimedia | 2013

With one look: robust face recognition using single sample per person

De-An Huang; Yu-Chiang Frank Wang

In this paper, we address the problem of robust face recognition using single sample per person. Given only one training image per subject of interest, our proposed method is able to recognize query images with illumination or expression changes, or even the corrupted ones due to occlusion. In order to model the above intra-class variations, we advocate the use of external data (i.e., images of subjects not of interest) for learning an exemplar-based dictionary. This dictionary provides auxiliary yet representative information for handling intra-class variation, while the gallery set containing one training image per class preserves separation between different subjects for recognition purposes. Our experiments on two face datasets confirm the effectiveness and robustness of our approach, which is shown to outperform state-of-the-art sparse representation based methods.

Explore More