Hailin Jin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hailin Jin is active.

Explore More

Publication

Featured researches published by Hailin Jin.

international conference on computer graphics and interactive techniques | 2009

Content-preserving warps for 3D video stabilization

Feng Liu; Michael Gleicher; Hailin Jin; Aseem Agarwala

We describe a technique that transforms a video from a hand-held video camera so that it appears as if it were taken with a directed camera motion. Our method adjusts the video to appear as if it were taken from nearby viewpoints, allowing 3D camera movements to be simulated. By aiming only for perceptual plausibility, rather than accurate reconstruction, we are able to develop algorithms that can effectively recreate dynamic scenes from a single source video. Our technique first recovers the original 3D camera motion and a sparse set of 3D, static scene points using an off-the-shelf structure-from-motion system. Then, a desired camera path is computed either automatically (e.g., by fitting a linear or quadratic path) or interactively. Finally, our technique performs a least-squares optimization that computes a spatially-varying warp from each input video frame into an output frame. The warp is computed to both follow the sparse displacements suggested by the recovered 3D structure, and avoid deforming the content in the video frame. Our experiments on stabilizing challenging videos of dynamic scenes demonstrate the effectiveness of our technique.

computer vision and pattern recognition | 2016

Image Captioning with Semantic Attention

Quanzeng You; Hailin Jin; Zhaowen Wang; Chen Fang; Jiebo Luo

Automatically generating a natural language description of an image has attracted interests recently both because of its importance in practical applications and because it connects two major artificial intelligence fields: computer vision and natural language processing. Existing approaches are either top-down, which start from a gist of an image and convert it into words, or bottom-up, which come up with words describing various aspects of an image and then combine them. In this paper, we propose a new algorithm that combines both approaches through a model of semantic attention. Our algorithm learns to selectively attend to semantic concept proposals and fuse them into hidden states and outputs of recurrent neural networks. The selection and fusion form a feedback connecting the top-down and bottom-up computation. We evaluate our algorithm on two public benchmarks: Microsoft COCO and Flickr30K. Experimental results show that our algorithm significantly outperforms the state-of-the-art approaches consistently across different evaluation metrics.

ACM Transactions on Graphics | 2011

Subspace video stabilization

Feng Liu; Michael Gleicher; Jue Wang; Hailin Jin; Aseem Agarwala

We present a robust and efficient approach to video stabilization that achieves high-quality camera motion for a wide range of videos. In this article, we focus on the problem of transforming a set of input 2D motion trajectories so that they are both smooth and resemble visually plausible views of the imaged scene; our key insight is that we can achieve this goal by enforcing subspace constraints on feature trajectories while smoothing them. Our approach assembles tracked features in the video into a trajectory matrix, factors it into two low-rank matrices, and performs filtering or curve fitting in a low-dimensional linear space. In order to process long videos, we propose a moving factorization that is both efficient and streamable. Our experiments confirm that our approach can efficiently provide stabilization results comparable with prior 3D methods in cases where those methods succeed, but also provides smooth camera motions in cases where such approaches often fail, such as videos that lack parallax. The presented approach offers the first method that both achieves high-quality video stabilization and is practical enough for consumer applications.

international conference on computer vision | 2001

Real-time feature tracking and outlier rejection with changes in illumination

Hailin Jin; Paolo Favaro; Stefano Soatto

We develop an efficient algorithm to track point features supported by image patches undergoing affine deformations and changes in illumination. The algorithm is based on a combined model of geometry and photometry, that is used to track features as well as to detect outliers in a hypothesis testing framework. The algorithm runs in real time on a personal computer; and is available to the public.

computer vision and pattern recognition | 2015

Fine-grained recognition without part annotations

Jonathan Krause; Hailin Jin; Jianchao Yang; Li Fei-Fei

Scaling up fine-grained recognition to all domains of fine-grained objects is a challenge the computer vision community will need to face in order to realize its goal of recognizing all object categories. Current state-of-the-art techniques rely heavily upon the use of keypoint or part annotations, but scaling up to hundreds or thousands of domains renders this annotation cost-prohibitive for all but the most important categories. In this work we propose a method for fine-grained recognition that uses no part annotations. Our method is based on generating parts using co-segmentation and alignment, which we combine in a discriminative mixture. Experimental results show its efficacy, demonstrating state-of-the-art results even when compared to methods that use part annotations during training.

computer vision and pattern recognition | 2008

Stereoscopic inpainting: Joint color and depth completion from stereo images

Liang Wang; Hailin Jin; Ruigang Yang; Minglun Gong

We present a novel algorithm for simultaneous color and depth inpainting. The algorithm takes stereo images and estimated disparity maps as input and fills in missing color and depth information introduced by occlusions or object removal. We first complete the disparities for the occlusion regions using a segmentation-based approach. The completed disparities can be used to facilitate the user in labeling objects to be removed. Since part of the removed regions in one image is visible in the other, we mutually complete the two images through 3D warping. Finally, we complete the remaining unknown regions using a depth-assisted texture synthesis technique, which simultaneously fills in both color and depth. We demonstrate the effectiveness of the proposed algorithm on several challenging data sets.

computer vision and pattern recognition | 2003

Multi-view stereo beyond Lambert

Hailin Jin; Stefano Soatto; Anthony J. Yezzi

We consider the problem of estimating the shape and radiance of an object from a calibrated set of views under the assumption that the reflectance of the object is non-Lambertian. Unlike traditional stereo, we do not solve the correspondence problem by comparing image-to-image. Instead, we exploit a rank constraint on the radiance tensor field of the surface in space, and use it to define a discrepancy measure between each image and the underlying model. Our approach automatically returns an estimate of the radiance of the scene, along with its shape, represented by a dense surface. The former can be used to generate novel views that capture the non-Lambertian appearance of the scene.

acm multimedia | 2014

RAPID: Rating Pictorial Aesthetics using Deep Learning

Xin Lu; Zhe Lin; Hailin Jin; Jianchao Yang; James Zijun Wang

Effective visual features are essential for computational aesthetic quality rating systems. Existing methods used machine learning and statistical modeling techniques on handcrafted features or generic image descriptors. A recently-published large-scale dataset, the AVA dataset, has further empowered machine learning based approaches. We present the RAPID (RAting PIctorial aesthetics using Deep learning) system, which adopts a novel deep neural network approach to enable automatic feature learning. The central idea is to incorporate heterogeneous inputs generated from the image, which include a global view and a local view, and to unify the feature learning and classifier training using a double-column deep convolutional neural network. In addition, we utilize the style attributes of images to help improve the aesthetic quality categorization accuracy. Experimental results show that our approach significantly outperforms the state of the art on the AVA dataset.

International Journal of Computer Vision | 2005

Multi-View Stereo Reconstruction of Dense Shape and Complex Appearance

Hailin Jin; Stefano Soatto; Anthony J. Yezzi

We address the problem of estimating the three-dimensional shape and complex appearance of a scene from a calibrated set of views under fixed illumination. Our approach relies on a rank condition that must be satisfied when the scene exhibits “specular + diffuse” reflectance characteristics. This constraint is used to define a cost functional for the discrepancy between the measured images and those generated by the estimate of the scene, rather than attempting to match image-to-image directly. Minimizing such a functional yields the optimal estimate of the shape of the scene, represented by a dense surface, as well as its radiance, represented by four functions defined on such a surface. These can be used to generate novel views that capture the non-Lambertian appearance of the scene.

The Visual Computer | 2003

A semi-direct approach to structure from motion

Hailin Jin; Paolo Favaro; Stefano Soatto

The problem of structure from motion is often decomposed into two steps: feature correspondence and three-dimensional reconstruction. This separation often causes gross errors when establishing correspondence fails. Therefore, we advocate the necessity to integrate visual information not only in time (i.e. across different views), but also in space, by matching regions – rather than points – using explicit photometric deformation models. We present an algorithm that integrates image-feature tracking and three-dimensional motion estimation into a closed loop, while detecting and rejecting outlier regions that do not fit the model. Due to occlusions and the causal nature of our algorithm, a drift in the estimates accumulates over time. We describe a method to perform global registration of local estimates of motion and structure by matching the appearance of feature regions stored over long time periods. We use image intensities to construct a score function that takes into account changes in brightness and contrast. Our algorithm is recursive and suitable for real-time implementation.

Explore More