Is this you? Create Your Porfile

Licheng Yu

University of North Carolina at Chapel Hill

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Licheng Yu is active.

Explore More

Publication

Featured researches published by Licheng Yu.

european conference on computer vision | 2016

Modeling Context in Referring Expressions

Licheng Yu; Patrick Poirson; Shan Yang; Alexander C. Berg; Tamara L. Berg

Humans refer to objects in their environments all the time, especially in dialogue with other people. We explore generating and comprehending natural language referring expressions for objects in images. In particular, we focus on incorporating better measures of visual context into referring expression models and find that visual comparison to other objects within an image helps improve performance significantly. We also develop methods to tie the language generation process together, so that we generate expressions for all objects of a particular category jointly. Evaluation on three recent datasets - RefCOCO, RefCOCO+, and RefCOCOg (Datasets and toolbox can be downloaded from https://github.com/lichengunc/refer), shows the advantages of our methods for both referring expression generation and comprehension.

international conference on computer vision | 2015

Visual Madlibs: Fill in the Blank Description Generation and Question Answering

Licheng Yu; Eunbyung Park; Alexander C. Berg; Tamara L. Berg

In this paper, we introduce a new dataset consisting of 360,001 focused natural language descriptions for 10,738 images. This dataset, the Visual Madlibs dataset, is collected using automatically produced fill-in-the-blank templates designed to gather targeted descriptions about: people and objects, their appearances, activities, and interactions, as well as inferences about the general scene or its broader context. We provide several analyses of the Visual Madlibs dataset and demonstrate its applicability to two new description generation tasks: focused description generation, and multiple-choice question-answering for images. Experiments using joint-embedding and deep learning methods show promising results on these tasks.

IEEE Transactions on Image Processing | 2015

Vector Sparse Representation of Color Image Using Quaternion Matrix Analysis

Yi Xu; Licheng Yu; Hongteng Xu; Hao Zhang; Truong Q. Nguyen

Traditional sparse image models treat color image pixel as a scalar, which represents color channels separately or concatenate color channels as a monochrome image. In this paper, we propose a vector sparse representation model for color images using quaternion matrix analysis. As a new tool for color image representation, its potential applications in several image-processing tasks are presented, including color image reconstruction, denoising, inpainting, and super-resolution. The proposed model represents the color image as a quaternion matrix, where a quaternion-based dictionary learning algorithm is presented using the K-quaternion singular value decomposition (QSVD) (generalized K-means clustering for QSVD) method. It conducts the sparse basis selection in quaternion space, which uniformly transforms the channel images to an orthogonal color space. In this new color space, it is significant that the inherent color structures can be completely preserved during vector reconstruction. Moreover, the proposed sparse model is more efficient comparing with the current sparse models for image restoration tasks due to lower redundancy between the atoms of different color channels. The experimental results demonstrate that the proposed sparse image model avoids the hue bias issue successfully and shows its potential as a general and powerful tool in color image analysis and processing domain.

computer vision and pattern recognition | 2017

A Joint Speaker-Listener-Reinforcer Model for Referring Expressions

Licheng Yu; Hao Tan; Mohit Bansal; Tamara L. Berg

Referring expressions are natural language constructions used to identify particular objects within a scene. In this paper, we propose a unified framework for the tasks of referring expression comprehension and generation. Our model is composed of three modules: speaker, listener, and reinforcer. The speaker generates referring expressions, the listener comprehends referring expressions, and the reinforcer introduces a reward function to guide sampling of more discriminative expressions. The listener-speaker modules are trained jointly in an end-to-end learning framework, allowing the modules to be aware of one another during learning while also benefiting from the discriminative reinforcer’s feedback. We demonstrate that this unified framework and training achieves state-of-the-art results for both comprehension and generation on three referring expression datasets.

empirical methods in natural language processing | 2017

Hierarchically-Attentive RNN for Album Summarization and Storytelling

Licheng Yu; Mohit Bansal; Tamara L. Berg

We address the problem of end-to-end visual storytelling. Given a photo album, our model first selects the most representative (summary) photos, and then composes a natural language story for the album. For this task, we make use of the Visual Storytelling dataset and a model composed of three hierarchically-attentive Recurrent Neural Nets (RNNs) to: encode the album photos, select representative (summary) photos, and compose the story. Automatic and human evaluations show our model achieves better performance on selection, generation, and retrieval than baselines.

visual communications and image processing | 2013

Single image super-resolution via phase congruency analysis

Licheng Yu; Yi Xu; Bo Zhang

Single image super-resolution (SR) is a severely unconstrained task. While the self-example-based methods are able to reproduce sharp edges, they perform poorly for textures. For recovering the fine details, higher-level image segmentation and corresponding external texture database are employed in the example-based SR methods, but they involve too much human interaction. In this paper, we discuss the existing problems of example-based technique using scale space analysis. Accordingly, a robust pixel classification method is designed based on the phase congruency model in scale space, which can effectively divide images into edges, textures and flat regions. Then a super-resolution framework is proposed, which can adaptively emphasize the importance of high-frequency residuals in structural examples and scale invariant fractal property in textural regions. Experimental results show that our SR approach is able to present both sharp edges and vivid textures with few artifacts.

arXiv: Computer Vision and Pattern Recognition | 2015