Hang Su | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hang Su is active.

Explore More

Publication

Featured researches published by Hang Su.

international conference on computer vision | 2015

Multi-view Convolutional Neural Networks for 3D Shape Recognition

Hang Su; Subhransu Maji; Evangelos Kalogerakis; Erik G. Learned-Miller

A longstanding question in computer vision concerns the representation of 3D shapes for recognition: should 3D shapes be represented with descriptors operating on their native 3D formats, such as voxel grid or polygon mesh, or can they be effectively represented with view-based descriptors? We address this question in the context of learning to recognize 3D shapes from a collection of their rendered views on 2D images. We first present a standard CNN architecture trained to recognize the shapes rendered views independently of each other, and show that a 3D shape can be recognized even from a single view at an accuracy far higher than using state-of-the-art 3D shape descriptors. Recognition rates further increase when multiple views of the shapes are provided. In addition, we present a novel CNN architecture that combines information from multiple views of a 3D shape into a single and compact shape descriptor offering even better recognition performance. The same architecture can be applied to accurately recognize human hand-drawn sketches of shapes. We conclude that a collection of 2D views can be highly informative for 3D shape recognition and is amenable to emerging CNN architectures and their derivatives.

International Journal of Computer Vision | 2017

Crowd Behavior Analysis via Curl and Divergence of Motion Trajectories

Shuang Wu; Hua Yang; Shibao Zheng; Hang Su; Yawen Fan; Ming-Hsuan Yang

In the field of crowd behavior analysis, existing methods mainly focus on using local representations inspired by models found in other disciplines (e.g., fluid dynamics and social dynamics) to describe motion patterns. However, less attention is paid to exploiting motion structures (e.g., visual information contained in trajectories) for behavior analysis. In this paper, we consider both local characteristics and global structures of a motion vector field, and propose the Curl and Divergence of motion Trajectories (CDT) descriptors to describe collective motion patterns. To this end, a trajectory-based motion coding algorithm is designed to extract the CDT descriptors. For each motion vector field we construct its conjugate field, in which each vector is perpendicular to the counterpart in the original vector field. The trajectories in the motion and corresponding conjugate fields indicate the tangential and radial motion structures, respectively. By integrating curl (and divergence, respectively) along the tangential paths (and the radial paths, respectively), the CDT descriptors are extracted. We show that the proposed motion descriptors are scale- and rotation-invariant for effective crowd behavior analysis. For concreteness, we apply the CDT descriptors to identify five typical crowd behaviors (lane, clockwise arch, counterclockwise arch, bottleneck and fountainhead) with a pipeline including motion decomposition. Extensive experimental results on two benchmark datasets demonstrate the effectiveness of the CDT descriptors for describing and classifying crowd behaviors.

Sensors | 2013

Video Sensor-Based Complex Scene Analysis with Granger Causality

Yawen Fan; Hua Yang; Shibao Zheng; Hang Su; Shuang Wu

In this report, we propose a novel framework to explore the activity interactions and temporal dependencies between activities in complex video surveillance scenes. Under our framework, a low-level codebook is generated by an adaptive quantization with respect to the activeness criterion. The Hierarchical Dirichlet Processes (HDP) model is then applied to automatically cluster low-level features into atomic activities. Afterwards, the dynamic behaviors of the activities are represented as a multivariate point-process. The pair-wise relationships between activities are explicitly captured by the non-parametric Granger causality analysis, from which the activity interactions and temporal dependencies are discovered. Then, each video clip is labeled by one of the activity interactions. The results of the real-world traffic datasets show that the proposed method can achieve a high quality classification performance. Compared with traditional K-means clustering, a maximum improvement of 19.19% is achieved by using the proposed causal grouping method.

international conference on image processing | 2016

Motion sketch based crowd video retrieval via motion structure coding

Shuang Wu; Hang Su; Shibao Zheng; Hua Yang; Qin Zhou

Crowd video retrieval is an important problem in surveillance video management in the era of big data, e.g., video indexing and browsing. In this paper, we address this issue from the motion-level perspective by using hand-drawn sketches as queries. Motion sketch based crowd video retrieval naturally suffers from challenges in motion-level video indexing and sketch representation. We tackle them by leveraging the motion structure coding algorithm to extract robust structure-preserved motion descriptors. For video indexing, we use motion decomposition to separate the sub-motion vector fields with typical patterns from a set of optical flows. Then, the motion-level descriptors of the vector fields are computed and stored in the index database. To represent sketch queries, we propose a sketch vectorization algorithm followed by motion structure coding. In the retrieval stage, given a new query, the retrieval function learned by the Ranking SVM algorithm predicts the ranking score of each motion pattern in the index database. Extensive experiments are conducted on the publicly available crowd datasets, which demonstrate the robustness and effectiveness of the proposed sketch based crowd video retrieval system.

Pattern Recognition | 2017

Joint dictionary and metric learning for person re-identification

Qin Zhou; Shibao Zheng; Haibin Ling; Hang Su; Shuang Wu

Abstract Matching people across non-overlapping camera views, known as person re-identification, is of great importance for long-term pedestrian tracking in smart surveillance systems. Among various algorithms for person re-identification, dictionary learning is frequently utilized to build robust feature representation for images across different camera views. Metric learning, on the other hand, is usually exploited to find optimal feature subspace that maximizes the inter-person divergence while minimizes the intra-person divergence. Although both representative features and discriminative metrics have great impacts on the performance of person re-identification methods, most of the existing algorithms focus on only one of the two aspects. In this paper, by explicitly modeling discriminative metric learning into the dictionary learning procedure, we propose to formulate robust feature representation learning and discriminative metric learning into a unified framework. To alleviate the issue of amount bias towards hard negative pairs in metric learning, instance selection is conducted for hard negative mining during the similarity constraint formulation. Besides, we come up with closed-form solutions for dictionary and coefficients update. Extensive experiments on three challenging datasets as well as the cross datasets experiments demonstrate the effectiveness and generalization ability of the joint dictionary and metric learning framework.

Journal of Visual Communication and Image Representation | 2017

Bilinear dynamics for crowd video analysis

Shuang Wu; Hang Su; Hua Yang; Shibao Zheng; Yawen Fan; Qin Zhou

Abstract In this paper, a novel crowd descriptor, termed as bilinear CD (Curl and Divergence) descriptor, is proposed based on the bilinear interaction of curl and divergence. Specifically, the curl and divergence activation maps are computed from the normalized average flow. A local curl patch and the corresponding divergence patch are cropped respectively from the activation maps. The outer product of the two local patches is defined as the bilinear CD vector. Through sliding a window on the activation maps, we can get hundreds to thousands local bilinear CD vectors. To encode them into a compact representation, fisher vector pooling and PCA algorithms are applied on the local descriptors. Experiments on the CUHK crowd dataset show that the proposed bilinear dynamics can improve the performance of video classification and retrieval by a noticeable margin when compared with the existing crowd features.

british machine vision conference | 2015

Kernelized View Adaptive Subspace Learning for Person Re-identification.

Qin Zhou; Shibao Zheng; Hang Su; Hua Yang; Yu Wang; Shuang Wu

Person re-identification refers to the task of recognizing the same person under different non-overlapping camera views and across different time and places. Many successful methods exploit complex feature representations or sophisticated learners. A recent trend to tackle this problem is to learn a suitable distance metric, the aim of which is to minimize the distance between true matches while maximize the distance between mismatched pairs. However, most of the existing metric learning algorithms directly take the difference of pairwise features in the original feature space as input. By doing so, they implicitly assume that there exists a projection matrix which can map feature vectors in two different subspaces into an identical subspace where desired feature distribution (features of the same person come closely and faraway otherwise) can be achieved. In this paper, we propose to learn different projection matrices for different camera views, thereby the learned matrices are adaptive to different camera views and a common subspace satisfying the desired feature distribution is more likely to be pursued. To better adapt to the different variations encountered by different views, the kernel trick is adopted to catch more information such that nonlinear transformation is possible. During test phase, the features under different camera views are projected into the learned subspace and a simple nearest neighbor classification is performed. Extensive experiments on four challenging datasets (VIPeR, iLIDS, CAVIAR4REID and ETHZ) demonstrate the effectiveness of the proposed algorithm.

Multimedia Tools and Applications | 2017

Motion sketch based crowd video retrieval

Shuang Wu; Hua Yang; Shibao Zheng; Hang Su; Qin Zhou; Xu Lu

Crowd video retrieval with desired motion flow segmentation is an important problem in surveillance video management, e.g., video indexing and browsing, especially in the age of big data. In this paper, we address this issue from the motion-level perspective by using hand-drawn sketches as queries. Motion sketch based crowd video retrieval naturally suffers from challenges in crowd motion representation and similarity measurement. To tackle them, we propose to (1) leverage the motion structure coding algorithm for motion-level video indexing and hand-drawn sketch representation and (2) exploit distance metric fusion strategy incorporated with Ranking SVM for measuring the relevant degree between a sketch query and crowd videos. Specifically, for video indexing, motion decomposition is utilized to separate sub-motion vector fields with typical patterns from a set of optical flows. Then, the motion-level descriptors of the vector fields are computed and stored in an index database. To represent motion sketches, we propose a mechanism by vectorizing the sketches followed by motion structure coding. In the retrieval stage, we first compute the pairwise distance with different metrics between a new sketch query and crowd videos, and then stack them into a feature vector as the input of the Ranking SVM algorithm. Finally, we use the learned retrieval model to predict the ranking score of each crowd video in the database. Experimental results on the publicly available crowd datasets show the robustness and effectiveness of the proposed sketch based crowd video retrieval system.

international conference on acoustics, speech, and signal processing | 2016

Joint instance and feature importance re-weighting for person reidentification

Qin Zhou; Shibao Zheng; Hua Yang; Yu Wang; Hang Su

Person reidentification refers to the task of recognizing the same person under different non-overlapping camera views. Presently, person reidentification based on metric learning is proved to be effective among various techniques, which exploits the labeled data to learn a subspace that maximizes the inter-person divergence while minimizes the intra-person divergence. However, these methods fail to take the different impacts of various instances and local features into account. To address this issue, we propose to learn a projection matrix such that the importance of different instances and local features are re-weighted jointly. We also come up with a simplified formulation of the proposed algorithm, thus it can be solved by the efficient UDFS optimization algorithm. Extensive experiments on the VIPeR and iLIDS datasets demonstrate the effectiveness and efficiency of our algorithm.

international conference on multimedia and expo | 2015

Towards active annotation for detection of numerous and scattered objects

Hang Su; Hua Yang; Shibao Zheng; Sha Wei; Yu Wang; Shuang Wu

Object detection is an active study area in the field of computer vision and image understanding. In this paper, we propose an active annotation algorithm by addressing the detection of numerous and scattered objects in a view, e.g., hundreds of cells in microscopy images. In particular, object detection is implemented by classifying pixels into specific classes with graph-based semi-supervised learning and grouping neighboring pixels with the same label. Sample or seed selection is conducted based on a novel annotation criterion that minimizes the expected prediction error. The most informative samples are therefore annotated actively, which are subsequently propagated to the unlabeled samples via a pairwise affinity graph. Experimental results conducted on two real world datasets validate that our proposed scheme quickly reaches high quality results and reduces human efforts significantly.

Explore More