Is this you? Create Your Porfile

Hao Ye

Chinese Academy of Sciences

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hao Ye is active.

Explore More

Publication

Featured researches published by Hao Ye.

international conference on multimedia and expo | 2017

Evolving boxes for fast vehicle detection

Li Wang; Hong Wang; Yingbin Zheng; Hao Ye; Xiangyang Xue

We perform fast vehicle detection from traffic surveillance cameras. A novel deep learning framework, namely Evolving Boxes, is developed that proposes and refines the object boxes under different feature representations. Specifically, our framework is embedded with a light-weight proposal network to generate initial anchor boxes as well as to early discard unlikely regions; a fine-turning network produces detailed features for these candidate boxes. We show intriguingly that by applying different feature fusion techniques, the initial boxes can be refined for both localization and recognition. We evaluate our network on the recent DETRAC benchmark and obtain a significant improvement over the state-of-the-art Faster RCNN by 9.5% mAP. Further, our network achieves 9–13 FPS detection speed on a moderate commercial GPU.

acm multimedia | 2016

Face Recognition via Active Annotation and Learning

Hao Ye; Weiyuan Shao; Hong Wang; Jianqi Ma; Li Wang; Yingbin Zheng; Xiangyang Xue

In this paper, we introduce an active annotation and learning framework for the face recognition task. Starting with an initial label deficient face image training set, we iteratively train a deep neural network and use this model to choose the examples for further manual annotation. We follow the active learning strategy and derive the Value of Information criterion to actively select candidate annotation images. During these iterations, the deep neural network is incrementally updated. Experimental results conducted on LFW benchmark and MS-Celeb-1M challenge demonstrate the effectiveness of our proposed framework.

IEEE Signal Processing Letters | 2018

Learning Multiviewpoint Context-Aware Representation for RGB-D Scene Classification

Yingbin Zheng; Hao Ye; Li Wang; Jian Pu

Effective visual representation plays an important role in the scene classification systems. While many existing methods are focused on the generic descriptors extracted from the RGB color channels, we argue the importance of depth context, since scenes are composed with spatial variability and depth is an essential component in understanding the geometry. In this letter, we present a novel depth representation for RGB-D scene classification based on a specific designed convolutional neural network (CNN). Contrast to previous deep models that transfer from pretrained RGB CNN models, we harness model by using the multiviewpoint depth image augmentation to overcome the data scarcity problem. The proposed CNN framework contains the dilated convolutions to expand the receptive field and a subsequent spatial pooling to aggregate multiscale contextual information. The combination of contextual design and multiviewpoint depth images are important toward a more compact representation, compared to directly using original depth images or off-the-shelf networks. Through extensive experiments on SUN RGB-D dataset, we demonstrate that the representation outperforms recent state of the arts, and combining it with standard CNN-based RGB features can lead to further improvements.

pacific rim conference on multimedia | 2018

Satellite Image Scene Classification via ConvNet With Context Aggregation

Zhao Zhou; Yingbin Zheng; Hao Ye; Jian Pu; Gufei Sun

Scene classification is a fundamental problem to understand the high-resolution remote sensing imagery. Recently, convolutional neural network (ConvNet) has achieved remarkable performance in different tasks, and significant efforts have been made to develop various representations for satellite image scene classification. In this paper, we present a novel representation based on a ConvNet with context aggregation. The proposed two-pathway ResNet (ResNet-TP) architecture adopts the ResNet [1] as backbone, and the two pathways allow the network to model both local details and regional context. The ResNet-TP based representation is generated by global average pooling on the last convolutional layers from both pathways. Experiments on two scene classification datasets, UCM Land Use and NWPU-RESISC45, show that the proposed mechanism achieves promising improvements over state-of-the-art methods.

international conference on multimedia retrieval | 2018

Dense Dilated Network for Few Shot Action Recognition

Baohan Xu; Hao Ye; Yingbin Zheng; Heng Wang; Tianyu Luwang; Yu-Gang Jiang

Recently, video action recognition has been widely studied. Training deep neural networks requires a large amount of well-labeled videos. On the other hand, videos in the same class share high-level semantic similarity. In this paper, we introduce a novel neural network architecture to simultaneously capture local and long-term spatial temporal information. The dilated dense network is proposed with the blocks being composed of densely-connected dilated convolutions layers. The proposed framework is capable of fusing each layers outputs to learn high-level representations, and the representations are robust even with only few training snippets. The aggregations of dilated dense blocks are also explored. We conduct extensive experiments on UCF101 and demonstrate the effectiveness of our proposed method, especially with few training examples.

international conference on multimedia retrieval | 2018

Precise Temporal Action Localization by Evolving Temporal Proposals

Haonan Qiu; Yingbin Zheng; Hao Ye; Feng Wang; Liang He

Locating actions in long untrimmed videos has been a challenging problem in video content analysis. The performances of existing action localization approaches remain unsatisfactory in precisely determining the beginning and the end of an action. Imitating the human perception procedure with observations and refinements, we propose a novel three-phase action localization framework. Our framework is embedded with an Actionness Network to generate initial proposals through frame-wise similarity grouping, and then a Refinement Network to conduct boundary adjustment on these proposals. Finally, the refined proposals are sent to a Localization Network for further fine-grained location regression. The whole process can be deemed as multi-stage refinement using a novel non-local pyramid feature under various temporal granularities. We evaluate our framework on THUMOS14 benchmark and obtain a significant improvement over the state-of-the-arts approaches. Specifically, the performance gain is remarkable under precise localization with high IoU thresholds. Our proposed framework achieves mAP@IoU=0.5 of 34.2%.

pacific rim conference on multimedia | 2017

Indoor Scene Classification by Incorporating Predicted Depth Descriptor.

Yingbin Zheng; Jian Pu; Hong Wang; Hao Ye

Depth cue is crucial for perception of spatial layout and understanding the cluttered indoor scenes. However, there is little study of leveraging depth information within the image scene classification systems, mainly because the lack of depth labeling in existing monocular image datasets. In this paper, we introduce a framework to overcome this limitation by incorporating the predicted depth descriptor of the monocular images for indoor scene classification. The depth prediction model is firstly learned from existing RGB-D dataset using the multiscale convolutional network. Given a monocular RGB image, a representation encoding the predicted depth cue is generated. This predicted depth descriptors can be further fused with features from color channels. Experiments are performed on two indoor scene classification benchmarks and the quantitative comparisons demonstrate the effectiveness of proposed scheme.

advanced video and signal based surveillance | 2017

UA-DETRAC 2017: Report of AVSS2017 & IWT4S Challenge on Advanced Traffic Monitoring

Siwei Lyu; Ming-Ching Chang; Dawei Du; Longyin Wen; Honggang Qi; Yuezun Li; Yi Wei; Lipeng Ke; Tao Hu; Marco Del Coco; Pierluigi Carcagnì; Dmitriy Anisimov; Erik Bochinski; Fabio Galasso; Filiz Bunyak; Guang Han; Hao Ye; Hong Wang; Kannappan Palaniappan; Koray Ozcan; Li Wang; Liang Wang; Martin Lauer; Nattachai Watcharapinchai; Nenghui Song; Noor M. Al-Shakarji; Shuo Wang; Sikandar Amin; Sitapa Rujikietgumjorn; Tatiana Khanova

The rapid advances of transportation infrastructure have led to a dramatic increase in the demand for smart systems capable of monitoring traffic and street safety. Fundamental to these applications are a community-based evaluation platform and benchmark for object detection and multi-object tracking. To this end, we organize the AVSS2017 Challenge on Advanced Traffic Monitoring, in conjunction with the International Workshop on Traffic and Street Surveillance for Safety and Security (IWT4S), to evaluate the state-of-the-art object detection and multi-object tracking algorithms in the relevance of traffic surveillance. Submitted algorithms are evaluated using the large-scale UA-DETRAC benchmark and evaluation protocol. The benchmark, the evaluation toolkit and the algorithm performance are publicly available from the website http://detrac-db.rit.albany.edu.

Proceedings of the Workshop on Large-Scale Video Classification Challenge | 2017

Large-Scale Video Classification with Elastic Streaming Sequential Data Processing System

Yao Peng; Hao Ye; Yining Lin; Yixin Bao; Zhijian Zhao; Haonan Qiu; Li Wang; Yingbin Zheng

Videos are dominant on the Internet. Current systems to process large-scale videos are suboptimal due to the following reasons: (1) machine learning modules such as feature extractors and classifiers generate huge intermediate data and place heavy burden to the storage and network, and (2) task scheduling is explicit; manually configuring the machine learning modules on the cluster is tedious and inefficient. In this work, we propose Elastic Streaming Sequential data Processing system (ESSP) that supports automatic task scheduling; multiple machine learning components are automatically parallelized. Further, our system prevents extensive disc I/O by applying the in-memory dataflow scheme. Evaluation on real-world video classification datasets shows many-fold improvements.

chinese conference on biometric recognition | 2016

Compact Face Representation via Forward Model Selection

Weiyuan Shao; Hong Wang; Yingbin Zheng; Hao Ye

This paper proposes a compact face representation for face recognition. The face with landmark points in the image is detected and then used to generate transformed face regions. Different types of regions form the transformed face region datasets, and face networks are trained. A novel forward model selection algorithm is designed to simultaneously select the complementary face models and generate the compact representation. Employing a public dataset as training set and fusing by only six selected face networks, the recognition system with this compact face representation achieves 99.05 % accuracy on LFW benchmark.

Explore More