Yanghao Li | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yanghao Li is active.

Explore More

Publication

Featured researches published by Yanghao Li.

european conference on computer vision | 2016

Online Human Action Detection Using Joint Classification-Regression Recurrent Neural Networks

Yanghao Li; Cuiling Lan; Junliang Xing; Wenjun Zeng; Chunfeng Yuan; Jiaying Liu

Human action recognition from well-segmented 3D skeleton data has been intensively studied and has been attracting an increasing attention. Online action detection goes one step further and is more challenging, which identifies the action type and localizes the action positions on the fly from the untrimmed stream data. In this paper, we study the problem of online action detection from streaming skeleton data. We propose a multi-task end-to-end Joint Classification-Regression Recurrent Neural Network to better explore the action type and temporal localization information. By employing a joint classification and regression optimization objective, this network is capable of automatically localizing the start and end points of actions more accurately. Specifically, by leveraging the merits of the deep Long Short-Term Memory (LSTM) subnetwork, the proposed model automatically captures the complex long-range temporal dynamics, which naturally avoids the typical sliding window design and thus ensures high computational efficiency. Furthermore, the subtask of regression optimization provides the ability to forecast the action prior to its occurrence. To evaluate our proposed model, we build a large streaming video dataset with annotations. Experimental results on our dataset and the public G3D dataset both demonstrate very promising performance of our scheme.

international joint conference on artificial intelligence | 2017

Demystifying Neural Style Transfer.

Yanghao Li; Naiyan Wang; Jiaying Liu; Xiaodi Hou

Neural Style Transfer has recently demonstrated very exciting results which catches eyes in both academia and industry. Despite the amazing results, the principle of neural style transfer, especially why the Gram matrices could represent style remains unclear. In this paper, we propose a novel interpretation of neural style transfer by treating it as a domain adaptation problem. Specifically, we theoretically show that matching the Gram matrices of feature maps is equivalent to minimize the Maximum Mean Discrepancy (MMD) with the second order polynomial kernel. Thus, we argue that the essence of neural style transfer is to match the feature distributions between the style images and the generated images. To further support our standpoint, we experiment with several other distribution alignment methods, and achieve appealing results. We believe this novel interpretation connects these two important research fields, and could enlighten future researches.

international conference on acoustics, speech, and signal processing | 2015

Neighborhood regression for edge-preserving image super-resolution

Yanghao Li; Jiaying Liu; Wenhan Yang; Zongming Guo

There have been many proposed works on image super-resolution via employing different priors or external databases to enhance HR results. However, most of them do not work well on the reconstruction of high-frequency details of images, which are more sensitive for human vision system. Rather than reconstructing the whole components in the image directly, we propose a novel edge-preserving super-resolution algorithm, which reconstructs low- and high-frequency components separately. In this paper, a Neighborhood Regression method is proposed to reconstruct high-frequency details on edge maps, and low-frequency part is reconstructed by the traditional bicubic method. Then, we perform an iterative combination method to obtain the estimated high resolution result, based on an energy minimization function which contains both low-frequency consistency and high-frequency adaptation. Extensive experiments evaluate the effectiveness and performance of our algorithm. It shows that our method is competitive or even better than the state-of-art methods.

international conference on acoustics, speech, and signal processing | 2017

Online action detection and forecast via Multitask deep Recurrent Neural Networks

Chunhui Liu; Yanghao Li; Yueyu Hu; Jiaying Liu

Online human action detection and forecast on untrimmed 3D skeleton sequences is a novel task based on traditional action recognition and has not been fully studied. Its aim is to localize and recognize one action in a long sequence while doing forecasting task at the same time. In this paper, we propose an online detection algorithm featuring Multi-Task Recurrent Neural Network to solve this problem. First, a deep Long Short Term Memory (LSTM) network is designed for feature extraction and temporal dynamic modeling. Then we utilize a classification subnetwork to classify one action, and predict the status of it at the same time. To forecast the occurrence of actions and estimate the accurate time of occurrence, we incorporate a regression subnetwork to our model. Then we split the action classes to three stages and train the model by optimizing a joint classification regression objective function. Experimental results show that the proposed model achieves satisfactory results on online action detection and forecast.

international conference on image processing | 2015

Multi-pose face hallucination via neighbor embedding for facial components

Yanghao Li; Jiaying Liu; Wenhan Yang; Zongming Guo

In this paper, we propose a novel multi-pose face hallucination method based on Neighbor Embedding for Facial Components (NEFC) to magnify face images with various poses and expressions. To represent the structure of a face, a facial component decomposition is employed on each face image. Then, a neighbor embedding reconstruction method with locality-constraint is performed for each facial component. For the video scenario, we utilize optical flow to locate the position of each patch among the neighboring frames and make use of the Intra and Inter Nonlocal Means method to preserve consistency between neighboring frames. Experimental results evaluate the effectiveness and adaptability of our algorithm. It shows that our method achieves better performance than the state-of-the-art methods, especially on the face images with various poses and expressions.

Proceedings of the Workshop on Visual Analysis in Smart and Connected Communities | 2017

PKU-MMD: A Large Scale Benchmark for Skeleton-Based Human Action Understanding

Chunhui Liu; Yueyu Hu; Yanghao Li; Sijie Song; Jiaying Liu

Despite the fact that many 3D human activity benchmarks being proposed, most existing action datasets focus on the action recognition tasks for the segmented videos. There is a lack of standard large-scale benchmarks, especially for current popular data-hungry deep learning based methods. In this paper, we introduce a new large scale benchmark (PKU-MMD) for continuous skeleton-based human action understanding and cover a wide range of complex human activities with well annotated information. PKU-MMD contains 1076 long video sequences in 51 action categories, performed by 66 subjects in three camera views. It contains almost 20,000 action instances and 5.4 million frames in total. Our dataset also provides multi-modality data sources, including RGB, depth, Infrared Radiation and Skeleton. To the best of our knowledge, it is the largest skeleton-based detection database so far. We conduct extensive experiments and evaluate different methods on this dataset. We believe this large-scale dataset will benefit future researches on action detection for the community.

international conference on acoustics, speech, and signal processing | 2016

Joint sub-band based neighbor embedding for image super-resolution

Sijie Song; Yanghao Li; Jiaying Liu; Zongming Quo

In this paper, we propose a novel neighbor embedding method based on joint sub-bands for image super-resolution. Rather than directly reconstructing the total spatial variations of the input image, we restore each frequency component separately. The input LR image is decomposed into sub-bands defined by steerable filters to capture structural details on different directional frequency components. Then the neighbor embedding principle is employed to reconstruct each band, respectively. Moreover, taken the diverse characteristics of each band into account, we adopt adaptive similarity criteri-ons for searching nearest neighbors. Finally, we recombine the generated HR sub-bands by applying the inverting subband decomposition to get the final super-resolved result. Experimental results demonstrate the effectiveness of our method both in objective and subjective qualities comparing with other state-of-the-art methods.

pacific rim conference on multimedia | 2018

Rethinking Fusion Baselines for Multi-modal Human Action Recognition

Hongda Jiang; Yanghao Li; Sijie Song; Jiaying Liu

In this paper we study fusion baselines for multi-modal action recognition. Our work explores different strategies for multiple stream fusion. First, we consider the early fusion which fuses the different modal inputs by directly stacking them along the channel dimension. Second, we analyze the late fusion scheme of fusing the scores from different modal streams. Then, the middle fusion scheme in different aggregation stages is explored. Besides, a modal transformation module is developed to adaptively exploit the complementary information from various modal data. We give comprehensive analysis of fusion schemes described above through experimental results and hope our work could benefit the community in multi-modal action recognition.

asia pacific signal and information processing association annual summit and conference | 2015

Face hallucination based on neighbor embedding via illumination adaptation

Sijie Song; Yanghao Li; Zhihan Gao; Jiaying Liu

In this paper, we present a novel face hallucination method by neighbor embedding considering illumination adaptation (NEIA) to super-resolve faces when the lighting conditions of the training faces mismatch those of the testing face. For illumination adjustment, face alignment is employed through dense correspondence. Next, every training face is composed into two layers to extract both details and highlight components. By operating the two layers of each face respectively, an extended training set is acquired by combining the original and adapted faces compensated in illumination. Finally, we reconstruct the input faces through neighbor embedding. To improve the estimation of neighbor embedding coefficients, nonlocal similarity is taken into consideration. Experimental results show that the proposed method outperforms other state-of-the-art methods both in subjective and objective qualities.

visual communications and image processing | 2014

Image transformation using limited reference with application to photo-sketch synthesis

Wei Bai; Yanghao Li; Jiaying Liu; Zongming Guo

Image transformation refers to transforming images from a source image space to a target image space. Contemporary image transformation methods achieve this by learning coupled dictionaries from a set of paired images. However, in practical use, such paired training images are not easy to get especially when the target image style is not fixed. Thus in most cases, the reference is limited. In this paper, we propose a sparse representation based framework of transforming images with limited reference, which can be used for the typical image transformation application, photo-sketch synthesis. In the learning stage, the edge features are utilized to map patches between different style images, thus building the coupled database for dictionary learning. In the reconstruction stage, sparse representation can well preserve the basic structure of image contents. In addition, a texture synthesis strategy is introduced to enhance target-like textures in the output image. Experimental results show that the performance of our method is comparable to state-of-the-art methods even with limited reference, which is very efficient and less restrictive for practical use.

Explore More