Xiangyuan Lan
Hong Kong Baptist University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Xiangyuan Lan.
computer vision and pattern recognition | 2014
Xiangyuan Lan; Andy Jinhua Ma; Pong Chi Yuen
The use of multiple features for tracking has been proved as an effective approach because limitation of each feature could be compensated. Since different types of variations such as illumination, occlusion and pose may happen in a video sequence, especially long sequence videos, how to dynamically select the appropriate features is one of the key problems in this approach. To address this issue in multi-cue visual tracking, this paper proposes a new joint sparse representation model for robust feature-level fusion. The proposed method dynamically removes unreliable features to be fused for tracking by using the advantages of sparse representation. As a result, robust tracking performance is obtained. Experimental results on publicly available videos show that the proposed method outperforms both existing sparse representation based and fusion-based trackers.
IEEE Transactions on Image Processing | 2015
Xiangyuan Lan; Andy Jinhua Ma; Pong Chi Yuen; Rama Chellappa
Visual tracking using multiple features has been proved as a robust approach because features could complement each other. Since different types of variations such as illumination, occlusion, and pose may occur in a video sequence, especially long sequence videos, how to properly select and fuse appropriate features has become one of the key problems in this approach. To address this issue, this paper proposes a new joint sparse representation model for robust feature-level fusion. The proposed method dynamically removes unreliable features to be fused for tracking by using the advantages of sparse representation. In order to capture the non-linear similarity of features, we extend the proposed method into a general kernelized framework, which is able to perform feature fusion on various kernel spaces. As a result, robust tracking performance is obtained. Both the qualitative and quantitative experimental results on publicly available videos show that the proposed method outperforms both sparse representation-based and fusion based-trackers.
IEEE Transactions on Image Processing | 2018
Xiangyuan Lan; Shengping Zhang; Pong Chi Yuen; Rama Chellappa
The use of multiple features has been shown to be an effective strategy for visual tracking because of their complementary contributions to appearance modeling. The key problem is how to learn a fused representation from multiple features for appearance modeling. Different features extracted from the same object should share some commonalities in their representations while each feature should also have some feature-specific representation patterns which reflect its complementarity in appearance modeling. Different from existing multi-feature sparse trackers which only consider the commonalities among the sparsity patterns of multiple features, this paper proposes a novel multiple sparse representation framework for visual tracking which jointly exploits the shared and feature-specific properties of different features by decomposing multiple sparsity patterns. Moreover, we introduce a novel online multiple metric learning to efficiently and adaptively incorporate the appearance proximity constraint, which ensures that the learned commonalities of multiple features are more representative. Experimental results on tracking benchmark videos and other challenging videos demonstrate the effectiveness of the proposed tracker.
IEEE Transactions on Circuits and Systems for Video Technology | 2017
Shengping Zhang; Xiangyuan Lan; Yuankai Qi; Pong Chi Yuen
Most existing tracking approaches are based on either the tracking by detection framework or the tracking by matching framework. The former needs to learn a discriminative classifier using positive and negative samples, which will cause tracking drift due to unreliable samples. The latter usually performs tracking by matching local interest points between a target candidate and the tracked target, which is not robust to target appearance changes over time. In this paper, we propose a novel tracking by matching framework for robust tracking based on basis matching rather than point matching. In particular, we learn the target model from target images using a set of Gabor basis functions, which have large responses on the corresponding spatial positions after a max pooling. During tracking, a target candidate is evaluated by computing the responses of the Gabor basis functions on their corresponding spatial positions. The experimental results on a set of challenging sequences validate that the performance of the proposed tracking method outperforms those of several state-of-the-art methods.
IEEE Transactions on Neural Networks | 2017
Shengping Zhang; Xiangyuan Lan; Hongxun Yao; Huiyu Zhou; Dacheng Tao; Xuelong Li
In this paper, we propose a biologically inspired appearance model for robust visual tracking. Motivated in part by the success of the hierarchical organization of the primary visual cortex (area V1), we establish an architecture consisting of five layers: whitening, rectification, normalization, coding, and pooling. The first three layers stem from the models developed for object recognition. In this paper, our attention focuses on the coding and pooling layers. In particular, we use a discriminative sparse coding method in the coding layer along with spatial pyramid representation in the pooling layer, which makes it easier to distinguish the target to be tracked from its background in the presence of appearance variations. An extensive experimental study shows that the proposed method has higher tracking accuracy than several state-of-the-art trackers.
european signal processing conference | 2016
Renfei Liu; Xiangyuan Lan; Pong Chi Yuen; Guo-Can Feng
Using multiple features in appearance modeling has shown to be effective for visual tracking. In this paper, we dynamically measured the importance of different features and proposed a robust tracker with the weighted features. By doing this, the dictionaries are improved in both reconstructive and discriminative way. We extracted multiple features of the target, and obtained multiple sparse representations, which plays an essential role in the classification issue. After learning independent dictionaries for each feature, we then implement weights to each feature dynamically, with which we select the best candidate by a weighted joint decision measure. Experiments have shown that our method outperforms several recently proposed trackers.
IEEE Transactions on Intelligent Transportation Systems | 2018
Shengping Zhang; Yuankai Qi; Feng Jiang; Xiangyuan Lan; Pong Chi Yuen; Huiyu Zhou
For autonomous driving application, a car shall be able to track objects in the scene in order to estimate where and how they will move such that the tracker embedded in the car can efficiently alert the car for effective collision-avoidance. Traditional discriminative object tracking methods usually train a binary classifier via a support vector machine (SVM) scheme to distinguish the target from its background. Despite demonstrated success, the performance of the SVM-based trackers is limited because the classification is carried out only depending on support vectors (SVs) but the target’s dynamic appearance may look similar to the training samples that have not been selected as SVs, especially when the training samples are not linearly classifiable. In such cases, the tracker may drift to the background and fail to track the target eventually. To address this problem, in this paper, we propose to integrate the point-to-set/ image-to-imageSet distance metric learning (DML) into visual tracking tasks and take full advantage of all the training samples when determining the best target candidate. The point-to-set DML is conducted on convolutional neural network features of the training data extracted from the starting frames. When a new frame comes, target candidates are first projected to the common subspace using the learned mapping functions, and then the candidate having the minimal distance to the target template sets is selected as the tracking result. Extensive experimental results show that even without model update the proposed method is able to achieve favorable performance on challenging image sequences compared with several state-of-the-art trackers.
international joint conference on artificial intelligence | 2018
Mang Ye; Zheng Wang; Xiangyuan Lan; Pong Chi Yuen
Cross-modality person re-identification between the thermal and visible domains is extremely important for night-time surveillance applications. Existing works in this filed mainly focus on learning sharable feature representations to handle the cross-modality discrepancies. However, besides the cross-modality discrepancy caused by different camera spectrums, visible thermal person re-identification also suffers from large crossmodality and intra-modality variations caused by different camera views and human poses. In this paper, we propose a dual-path network with a novel bi-directional dual-constrained top-ranking loss to learn discriminative feature representations. It is advantageous in two aspects: 1) end-to-end feature learning directly from the data without extra metric learning steps, 2) it simultaneously handles the cross-modality and intra-modality variations to ensure the discriminability of the learnt representations. Meanwhile, identity loss is further incorporated to model the identity-specific information to handle large intra-class variations. Extensive experiments on two datasets demonstrate the superior performance compared to the state-of-the-arts.
european conference on computer vision | 2018
Mang Ye; Xiangyuan Lan; Pong Chi Yuen
This paper addresses the scalability and robustness issues of estimating labels from imbalanced unlabeled data for unsupervised video-based person re-identification (re-ID). To achieve it, we propose a novel Robust AnChor Embedding (RACE) framework via deep feature representation learning for large-scale unsupervised video re-ID. Within this framework, anchor sequences representing different persons are firstly selected to formulate an anchor graph which also initializes the CNN model to get discriminative feature representations for later label estimation. To accurately estimate labels from unlabeled sequences with noisy frames, robust anchor embedding is introduced based on the regularized affine hull. Efficiency is ensured with kNN anchors embedding instead of the whole anchor set under manifold assumptions. After that, a robust and efficient top-k counts label prediction strategy is proposed to predict the labels of unlabeled image sequences. With the newly estimated labeled sequences, the unified anchor embedding framework enables the feature learning process to be further facilitated. Extensive experimental results on the large-scale dataset show that the proposed method outperforms existing unsupervised video re-ID methods.
acm multimedia | 2018
Rui Shao; Xiangyuan Lan; Pong Chi Yuen
In multimedia analysis, one objective of unsupervised visual domain adaptation is to train a classifier that works well on a target domain given labeled source samples and unlabeled target samples. Feature alignment of two domains is the key issue which should be addressed to achieve this objective. Inspired by the recent study of Generative Adversarial Networks (GAN) in domain adaptation, this paper proposes a new model based on Generative Adversarial Network, named Hierarchical Adversarial Deep Network (HADN), which jointly optimizes the feature-level and pixel-level adversarial adaptation within a hierarchical network structure. Specifically, the hierarchical network structure ensures that the knowledge from pixel-level adversarial adaptation can be back propagated to facilitate the feature-level adaptation, which achieves a better feature alignment under the constraint of pixel-level adversarial adaptation. Extensive experiments on various visual recognition tasks show that the proposed method performs favorably against or better than competitive state-of-the-art methods.