Sijie Song
Peking University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sijie Song.
Proceedings of the Workshop on Visual Analysis in Smart and Connected Communities | 2017
Chunhui Liu; Yueyu Hu; Yanghao Li; Sijie Song; Jiaying Liu
Despite the fact that many 3D human activity benchmarks being proposed, most existing action datasets focus on the action recognition tasks for the segmented videos. There is a lack of standard large-scale benchmarks, especially for current popular data-hungry deep learning based methods. In this paper, we introduce a new large scale benchmark (PKU-MMD) for continuous skeleton-based human action understanding and cover a wide range of complex human activities with well annotated information. PKU-MMD contains 1076 long video sequences in 51 action categories, performed by 66 subjects in three camera views. It contains almost 20,000 action instances and 5.4 million frames in total. Our dataset also provides multi-modality data sources, including RGB, depth, Infrared Radiation and Skeleton. To the best of our knowledge, it is the largest skeleton-based detection database so far. We conduct extensive experiments and evaluate different methods on this dataset. We believe this large-scale dataset will benefit future researches on action detection for the community.
international conference on acoustics, speech, and signal processing | 2016
Sijie Song; Yanghao Li; Jiaying Liu; Zongming Quo
In this paper, we propose a novel neighbor embedding method based on joint sub-bands for image super-resolution. Rather than directly reconstructing the total spatial variations of the input image, we restore each frequency component separately. The input LR image is decomposed into sub-bands defined by steerable filters to capture structural details on different directional frequency components. Then the neighbor embedding principle is employed to reconstruct each band, respectively. Moreover, taken the diverse characteristics of each band into account, we adopt adaptive similarity criteri-ons for searching nearest neighbors. Finally, we recombine the generated HR sub-bands by applying the inverting subband decomposition to get the final super-resolved result. Experimental results demonstrate the effectiveness of our method both in objective and subjective qualities comparing with other state-of-the-art methods.
international conference on acoustics, speech, and signal processing | 2016
Shuai Yang; Jiaying Liu; Sijie Song; Mading Li; Zongming Quo
In this paper, we propose a novel hierarchical image completion approach using regularity statistics, considering structure features. Guided by dominant structures, the target image is used to generate reference images in a self-reproductive way by image data enhancement. The structure-guided image data enhancement allows us to expand the search space for samples. A Markov Random Field model is used to guide the enhanced image data combination to globally reconstruct the target image. For lower computational complexity and more accurate structure estimation, a hierarchical process is implemented. Experiments demonstrate the effectiveness of our method comparing to several state-of-the-art image completion techniques.
pacific rim conference on multimedia | 2018
Hongda Jiang; Yanghao Li; Sijie Song; Jiaying Liu
In this paper we study fusion baselines for multi-modal action recognition. Our work explores different strategies for multiple stream fusion. First, we consider the early fusion which fuses the different modal inputs by directly stacking them along the channel dimension. Second, we analyze the late fusion scheme of fusing the scores from different modal streams. Then, the middle fusion scheme in different aggregation stages is explored. Besides, a modal transformation module is developed to adaptively exploit the complementary information from various modal data. We give comprehensive analysis of fusion schemes described above through experimental results and hope our work could benefit the community in multi-modal action recognition.
international conference on pattern recognition | 2016
Shuai Yang; Sijie Song; Qikun Guo; Xiaoqing Lu; Jiaying Liu
The simple yet subtle structures of faces make it difficult to capture the fine differences between different facial regions in the depth map, especially for consumer devices like Kinect. To address this issue, we present a novel method to super-solve and recover the facial depth map nicely. The key idea of our approach is to exploit the learning-based method to obtain the reliable face priors from high quality facial depth map to further improve the depth image. Specifically, we utilize the neighbor embedding framework. First, face components are decomposed to train specialized dictionaries and reconstructed, respectively. Joint features, i.e. color, depth and position cues, are put forward for robust patch similarity measurement. The neighbor embedding results form high frequency cues of facial depth details and gradients. Finally, an optimization function is defined to combine these high frequency information to yield depth maps that fit the actual face structures better. Experimental results demonstrate the superiority of our method compared to state-of-the-art techniques in recovering both synthetic data and real world data from Kinect.
asia pacific signal and information processing association annual summit and conference | 2015
Mading Li; Wenhan Yang; Sijie Song; Zongming Guo
In this paper, we propose a novel error concealment method based on multiscale patch clustering and low-rank minimization. In order to collect more reliable patches to form a genuine low-rank matrix, an image pyramid is formed utilizing an effective down-sampling process. The classic singular value thresholding (SVT) is modified into a global iteration to solve the low-rank minimization problem. Extensive experimental results on the random pixel loss and the block loss situation validate the effectiveness of the proposed method. The proposed method acquires higher PSNR and better visual quality than the state-of-the-art low-rank based error concealment methods.
asia pacific signal and information processing association annual summit and conference | 2015
Sijie Song; Yanghao Li; Zhihan Gao; Jiaying Liu
In this paper, we present a novel face hallucination method by neighbor embedding considering illumination adaptation (NEIA) to super-resolve faces when the lighting conditions of the training faces mismatch those of the testing face. For illumination adjustment, face alignment is employed through dense correspondence. Next, every training face is composed into two layers to extract both details and highlight components. By operating the two layers of each face respectively, an extended training set is acquired by combining the original and adapted faces compensated in illumination. Finally, we reconstruct the input faces through neighbor embedding. To improve the estimation of neighbor embedding coefficients, nonlocal similarity is taken into consideration. Experimental results show that the proposed method outperforms other state-of-the-art methods both in subjective and objective qualities.
national conference on artificial intelligence | 2017
Sijie Song; Cuiling Lan; Junliang Xing; Wenjun Zeng; Jiaying Liu
arXiv: Computer Vision and Pattern Recognition | 2017
Chunhui Liu; Yueyu Hu; Yanghao Li; Sijie Song; Jiaying Liu
IEEE Transactions on Image Processing | 2018
Sijie Song; Cuiling Lan; Junliang Xing; Wenjun Zeng; Jiaying Liu