Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yun Cheng is active.

Publication


Featured researches published by Yun Cheng.


Multimedia Tools and Applications | 2017

Stratified pooling based deep convolutional neural networks for human action recognition

Sheng Yu; Yun Cheng; Songzhi Su; Guorong Cai; Shaozi Li

Video based human action recognition is an active and challenging topic in computer vision. Over the last few years, deep convolutional neural networks (CNN) has become the most popular method and achieved the state-of-the-art performance on several datasets, such as HMDB-51 and UCF-101. Since each video has a various number of frame-level features, how to combine these features to acquire good video-level feature becomes a challenging task. Therefore, this paper proposed a novel action recognition method named stratified pooling, which is based on deep convolutional neural networks (SP-CNN). The process is mainly composed of five parts: (i) fine-tuning a pre-trained CNN on the target dataset, (ii) frame-level features extraction; (iii) the principal component analysis (PCA) method for feature dimensionality reduction; (iv) stratified pooling frame-level features to get video-level feature; and (v) SVM for multiclass classification. Finally, the experimental results conducted on HMDB-51 and UCF-101 datasets show that the proposed method outperforms the state-of-the-art.


Iet Computer Vision | 2017

Fully convolutional networks for action recognition

Sheng Yu; Yun Cheng; Li Xie; Shaozi Li

Human action recognition is an important and challenging topic in computer vision. Recently, convolutional neural networks (CNNs) have established impressive results for many image recognition tasks. The CNNs usually contain million parameters which prone to overfit when training on small datasets. Therefore, the CNNs do not produce superior performance over traditional methods for action recognition. In this study, the authors design a novel two-stream fully convolutional networks architecture for action recognition which can significantly reduce parameters while keeping performance. To utilise the advantage of spatial-temporal features, a linear weighted fusion method is used to fuse two-stream networks’ feature maps and a video pooling method is adopted to construct the video-level features. At the meantime, the authors also demonstrate that the improved dense trajectories has significant impact for action recognition. The authors’ method can achieve the state-of-the-art performance on two challenging datasets UCF101 (93.0%) and HMDB51 (70.2%).


Journal of Visual Communication and Image Representation | 2016

Local consistent hierarchical Hough Match for image re-ranking

Yuanzheng Cai; Shaozi Li; Yun Cheng; Rongrong Ji

Re-ranking algorithm special for small vocabulary.Hierarchical Hough Voting from local to global.Being able to store in mobile visual search systems. Geometric image re-ranking is a widely adopted phrase to refine the large-scale image retrieval systems built based upon popular paradigms such as Bag-of-Words (BoW) model. Its main idea can be treated as a sort of geometric verification targeting at reordering the initial returning list by previous similarity ranking metrics, e.g. Cosine distance over the BoW vectors between query image and reference ones. In the literature, to guarantee the re-ranking accuracy, most existing schemes requires the initial retrieval to be conducted by using a large vocabulary (codebook), corresponding to a high-dimensional BoW vector. However, in many emerging applications such as mobile visual search and massive-scale retrieval, the retrieval has to be conducted by using a compact BoW vector to accomplish the memory or time requirement. In these scenarios, the traditional re-ranking paradigms are questionable and new algorithms are urgently demanded. In this paper, we propose an accurate yet efficient image re-ranking algorithm specific for small vocabulary in aforementioned scenarios. Our idea is inspired by Hough Voting in the transformation space, where votes come from local feature matches. Most notably, this geometry re-ranking can easily been aggregated to the cutting-edge image based retrieval systems yielding superior performance with a small vocabulary and being able to store in mobile end facilitating mobile visual search systems. We further prove that its time complexity is linear in terms of the re-ranking instance, which is a significant advantage over the existing scheme. In terms of mean Average Precision, we show that its performance is comparable or in some cases better than the state-of-the-art re-ranking schemes.


sino foreign interchange conference on intelligent science and intelligent data engineering | 2011

A hierarchical clustering based non-maximum suppression method in pedestrian detection

Bing Shuai; Yun Cheng; Shaozi Li; Songzhi Su

We learned that one true positive would have a cluster with dense detected windows near the geometric center of pedestrian, so we adopted clustering methods based on ellipse Euclidean distance to get the location of pedestrian. Moreover, considering the big-size pedestrians and small ones respond differently to the same classifier and a ‘weak true positive (few fire times) may be filtered, we partitioned the non-maximum suppression process into two parts to analyze them distinctively. We call this method hierarchical non-maximum suppression. The experiment showed that our non-hierarchical clustering based method did well as proposed by Dalal and consumed much less time (nearly 100 fold less time at 150 magnitude windows), while the proposed hierarchical algorithm recalled more true positives than the non-hierarchical method (5% percent higher detection rate at FPPI = 1).


Journal of Visual Communication and Image Representation | 2017

A novel recurrent hybrid network for feature fusion in action recognition

Sheng Yu; Yun Cheng; Li Xie; Zhiming Luo; Min Huang; Shaozi Li

Abstract Action recognition in video is one of the most important and challenging tasks in computer vision. How to efficiently combine the spatial-temporal information to represent video plays a crucial role for action recognition. In this paper, a recurrent hybrid network architecture is designed for action recognition by fusing multi-source features: a two-stream CNNs for learning semantic features, a two-stream single-layer LSTM for learning long-term temporal feature, and an Improved Dense Trajectories (IDT) stream for learning short-term temporal motion feature. In order to mitigate the overfitting issue on small-scale dataset, a video data augmentation method is used to increase the amount of training data, as well as a two-step training strategy is adopted to train our recurrent hybrid network. Experiment results on two challenging datasets UCF-101 and HMDB-51 demonstrate that the proposed method can reach the state-of-the-art performance.


Optical Engineering | 2016

Method of joint frame synchronization and data-aided channel estimation for 100-Gb/s polarization-division multiplexing–single carrier frequency domain equalization coherent optical transmission systems

Yun Cheng; Jun Tan; Liu Liu; Jing He; Jin Tang; Lin Chen; Jun Zhang; Qiang Li; Minlei Xiao

Abstract. To improve the performance of channel estimation (CE), a method of joint frame synchronization and data-aided CE using less training overhead is proposed. A 100-Gb/s polarization-division multiplexing coherent transmission system with quaternary phase-shift keying based on the proposed method is demonstrated by simulation. The simulation results show that the proposed method could achieve accurate timing offset and CE in the presence of strong amplified spontaneous emission noise.


Optics Express | 2017

Multilevel modulation scheme using the overlapping of two light sources for visible light communication with mobile phone camera

Jin Shi; Jing He; Rui Deng; Yiran Wei; Fengting Long; Yun Cheng; Lin Chen

Visible light communication (VLC) with light emitting diodes (LEDs) is an emerging technology for 5G wireless communications. Recently, using complementary metal-oxide-semiconductor (CMOS) image sensor as VLC receiver is developed owing to its flexibility and low-cost. However, two illumination levels such as on-off keying (OOK) signal are used. To improve the system throughput and reduce complexity of the hardware design, in this paper, we propose and experimentally demonstrate a multilevel modulation scheme for VLC system utilizing the overlapping of two light sources for the first time, and the two light sources are modulated by an OOK and a Manchester signal respectively. At the receiver, a CMOS camera can demodulate the Manchester and the OOK signal simultaneously. Meanwhile, a low-pass filter (LPF) is used to enhance the system performance. The experimental results demonstrate that the proposed multilevel modulation scheme can achieve a net data rate of 4.32 kbit/s.


visual communications and image processing | 2013

Saliency detection by adaptive clustering

Hai Cao; Shaozi Li; Songzhi Su; Yun Cheng; Rongrong Ji

Saliency detection plays an important role in image segmentation, content-aware resizing and object recognition. Most approaches obtain promising performance recently, which is useful for the postprocessing. We propose a clustering-based method to detect refined regions with comparative performance. For coarse-grained classification with unknown clusters number, an adaptive algorithm called f-means is developed in this paper. Pixels are clustered by f-means based on color and spatial features, and then the centroids are used to compute their saliency values. Experiments show that our algorithm generates more fine maps, which outperform the state-of-the-art approaches on MSRA dataset. Relying on the saliency map, we also get superior results in foreground extracting, image resizing and thumbnails generation.


visual communications and image processing | 2013

Decomposed human localization in personal photo albums

Bing Shuai; Songzhi Su; Shaozi Li; Yun Cheng; Rongrong Ji

Recent years have seen tremendous progress in human detection, whereas only upright poses are usually considered. In this paper, we relax this constraint to localizing highly deformable persons, as commonly exhibited in personal photo albums. Human localization based on arbitrary pose is extremely challenging, due to the large pose variances, disabling the traditional part based template detectors. To tackle this issue, we propose a decomposition-based human localization model dealing with this issue in three-step: a stable upper-body is firstly detected, then a set of bigger bounding boxes are extended, from which the most appropriate instance is distinguished by a discriminative Whole Person Model. The experiment results demonstrated that our decomposition-based model worked very well at localizing deformable persons, which boosted the average precision by 10% compared to state-of-the-art person detectors. On the other hand, Similar Pose Feature(SPF) provides the feasibility of projecting persons with similar poses into same clusters, facilitating a novel pose-based photo album browsing functionality.


international conference on information science, electronics and electrical engineering | 2014

An improved 3D Bilinear Multidimensional Morphable Models used in 3D face recognition

Liying Wang; Bixia Liu; Songzhi Su; Yun Cheng; Shaozi Li

Collaboration


Dive into the Yun Cheng's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Minlei Xiao

Hunan University of Humanities

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Li Xie

Hunan University of Humanities

View shared research outputs
Researchain Logo
Decentralizing Knowledge