Junliang Xing
Chinese Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Junliang Xing.
european conference on computer vision | 2014
Jin Gao; Haibin Ling; Weiming Hu; Junliang Xing
Modeling the target appearance is critical in many modern visual tracking algorithms. Many tracking-by-detection algorithms formulate the probability of target appearance as exponentially related to the confidence of a classifier output. By contrast, in this paper we directly analyze this probability using Gaussian Processes Regression (GPR), and introduce a latent variable to assist the tracking decision. Our observation model for regression is learnt in a semi-supervised fashion by using both labeled samples from previous frames and the unlabeled samples that are tracking candidates extracted from the current frame. We further divide the labeled samples into two categories: auxiliary samples collected from the very early frames and target samples from most recent frames. The auxiliary samples are dynamically re-weighted by the regression, and the final tracking result is determined by fusing decisions from two individual trackers, one derived from the auxiliary samples and the other from the target samples. All these ingredients together enable our tracker, denoted as TGPR, to alleviate the drifting issue from various aspects. The effectiveness of TGPR is clearly demonstrated by its excellent performances on three recently proposed public benchmarks, involving 161 sequences in total, in comparison with state-of-the-arts.
computer vision and pattern recognition | 2009
Junliang Xing; Haizhou Ai; Shihong Lao
This paper presents an online detection-based two-stage multi-object tracking method in dense visual surveillances scenarios with a single camera. In the local stage, a particle filter with observer selection that could deal with partial object occlusion is used to generate a set of reliable tracklets. In the global stage, the detection responses are collected from a temporal sliding window to deal with ambiguity caused by full object occlusion to generate a set of potential tracklets. The reliable tracklets generated in the local stage and the potential tracklets generated within the temporal sliding window are associated by Hungarian algorithm on a modified pairwise tracklets association cost matrix to get the global optimal association. This method is applied to the pedestrian class and evaluated on two challenging datasets. The experimental results prove the effectiveness of our method.
Biochemical Engineering Journal | 2003
Mingfang Luo; Junliang Xing; Zhongxuan Gou; Shuangyue Li; Huizhou Liu; Jiayong Chen
Several parameters that influence the dibenzothiophene (DBT) desulfurization by lyophilized cells of Pseudomonas delafieldii R-8 were studied in the presence of dodecane. The aqueous media tested with pH range in 4.6-8.5 made no obvious difference on the desulfurization activity. The rate and extent of desulfurization were strongly dependent on the volume ratio of oil-to-water, DBT concentration and the cell concentration. The specific desulfurization rate of DBT and 4,6-dimethyl DBT (4,6-DMDBT) could reach 11.4 and 9.4 mmol sulfur kg(-1) dry cells (DCW) h(-1), respectively. The desulfurization pattern of DBT was represented by the Michaelis-Menten equation. The kinetic parameters, the limiting maximal velocity (V-max) and Michaelis constant (K-m), for desulfurization of DBT were calculated
IEEE Transactions on Image Processing | 2011
Junliang Xing; Haizhou Ai; Liwei Liu; Shihong Lao
Multiple object tracking (MOT) is a very challenging task yet of fundamental importance for many practical applications. In this paper, we focus on the problem of tracking multiple players in sports video which is even more difficult due to the abrupt movements of players and their complex interactions. To handle the difficulties in this problem, we present a new MOT algorithm which contributes both in the observation modeling level and in the tracking strategy level. For the observation modeling, we develop a progressive observation modeling process that is able to provide strong tracking observations and greatly facilitate the tracking task. For the tracking strategy, we propose a dual-mode two-way Bayesian inference approach which dynamically switches between an offline general model and an online dedicated model to deal with single isolated object tracking and multiple occluded object tracking integrally by forward filtering and backward smoothing. Extensive experiments on different kinds of sports videos, including football, basketball, as well as hockey, demonstrate the effectiveness and efficiency of the proposed method.
european conference on computer vision | 2016
Chi Su; Shiliang Zhang; Junliang Xing; Wen Gao; Qi Tian
The visual appearance of a person is easily affected by many factors like pose variations, viewpoint changes and camera parameter differences. This makes person Re-Identification (ReID) among multiple cameras a very challenging task. This work is motivated to learn mid-level human attributes which are robust to such visual appearance variations. And we propose a semi-supervised attribute learning framework which progressively boosts the accuracy of attributes only using a limited number of labeled data. Specifically, this framework involves a three-stage training. A deep Convolutional Neural Network (dCNN) is first trained on an independent dataset labeled with attributes. Then it is fine-tuned on another dataset only labeled with person IDs using our defined triplet loss. Finally, the updated dCNN predicts attribute labels for the target dataset, which is combined with the independent dataset for the final round of fine-tuning. The predicted attributes, namely \emph{deep attributes} exhibit superior generalization ability across different datasets. By directly using the deep attributes with simple Cosine distance, we have obtained surprisingly good accuracy on four person ReID datasets. Experiments also show that a simple metric learning modular further boosts our method, making it significantly outperform many recent works.
international conference on computer vision | 2013
Junliang Xing; Jin Gao; Bing Li; Weiming Hu; Shuicheng Yan
Recently, sparse representation has been introduced for robust object tracking. By representing the object sparsely, i.e., using only a few templates via L1-norm minimization, these so-called L1-trackers exhibit promising tracking results. In this work, we address the object template building and updating problem in these L1-tracking approaches, which has not been fully studied. We propose to perform template updating, in a new perspective, as an online incremental dictionary learning problem, which is efficiently solved through an online optimization procedure. To guarantee the robustness and adaptability of the tracking algorithm, we also propose to build a multi-lifespan dictionary model. By building target dictionaries of different life spans, effective object observations can be obtained to deal with the well-known drifting problem in tracking and thus improve the tracking accuracy. We derive effective observation models both generatively and discriminatively based on the online multi-lifespan dictionary learning model and deploy them to the Bayesian sequential estimation framework to perform tracking. The proposed approach has been extensively evaluated on ten challenging video sequences. Experimental results demonstrate the effectiveness of the online learned templates, as well as the state-of-the-art tracking performance of the proposed approach.
ACM Transactions on Multimedia Computing, Communications, and Applications | 2014
Luoqi Liu; Junliang Xing; Si Liu; Hui Xu; Xi Zhou; Shuicheng Yan
Beauty e-Experts, a fully automatic system for makeover recommendation and synthesis, is developed in this work. The makeover recommendation and synthesis system simultaneously considers many kinds of makeover items on hairstyle and makeup. Given a user-provided frontal face image with short/bound hair and no/light makeup, the Beauty e-Experts system not only recommends the most suitable hairdo and makeup, but also synthesizes the virtual hairdo and makeup effects. To acquire enough knowledge for beauty modeling, we built the Beauty e-Experts Database, which contains 1,505 female photos with a variety of attributes annotated with different discrete values. We organize these attributes into two different categories, beauty attributes and beauty-related attributes. Beauty attributes refer to those values that are changeable during the makeover process and thus need to be recommended by the system. Beauty-related attributes are those values that cannot be changed during the makeup process but can help the system to perform recommendation. Based on this Beauty e-Experts Dataset, two problems are addressed for the Beauty e-Experts system: what to recommend and how to wear it, which describes a similar process of selecting hairstyle and cosmetics in daily life. For the what-to-recommend problem, we propose a multiple tree-structured supergraph model to explore the complex relationships among high-level beauty attributes, mid-level beauty-related attributes, and low-level image features. Based on this model, the most compatible beauty attributes for a given facial image can be efficiently inferred. For the how-to-wear-it problem, an effective and efficient facial image synthesis module is designed to seamlessly synthesize the recommended makeovers into the user facial image. We have conducted extensive experiments on testing images of various conditions to evaluate and analyze the proposed system. The experimental results well demonstrate the effectiveness and efficiency of the proposed system.
european conference on computer vision | 2016
Shengtao Xiao; Jiashi Feng; Junliang Xing; Hanjiang Lai; Shuicheng Yan; Ashraf A. Kassim
In this work, we introduce a novel Recurrent Attentive-Refinement (RAR) network for facial landmark detection under unconstrained conditions, suffering from challenges like facial occlusions and/or pose variations. RAR follows the pipeline of cascaded regressions that refines landmark locations progressively. However, instead of updating all the landmark locations together, RAR refines the landmark locations sequentially at each recurrent stage. In this way, more reliable landmark points are refined earlier and help to infer locations of other challenging landmarks that may stay with occlusions and/or extreme poses. RAR can thus effectively control detection errors from those challenging landmarks and improve overall performance even in presence of heavy occlusions and/or extreme conditions. To determine the sequence of landmarks, RAR employs an attentive-refinement mechanism. The attention LSTM (A-LSTM) and refinement LSTM (R-LSTM) models are introduced in RAR. At each recurrent stage, A-LSTM implicitly identifies a reliable landmark as the attention center. Following the sequence of attention centers, R-LSTM sequentially refines the landmarks near or correlated with the attention centers and provides ultimate detection results finally. To further enhance algorithmic robustness, instead of using mean shape for initialization, RAR adaptively determines the initialization by selecting from a pool of shape centers clustered from all training shapes. As an end-to-end trainable model, RAR demonstrates superior performance in detecting challenging landmarks in comprehensive experiments and it also establishes new state-of-the-arts on the 300-W, COFW and AFLW benchmark datasets.
computer vision and pattern recognition | 2013
Xinchu Shi; Haibin Ling; Junliang Xing; Weiming Hu
In this paper we formulate multi-target tracking (MTT) as a rank-1 tensor approximation problem and propose an ℓ1 norm tensor power iteration solution. In particular, a high order tensor is constructed based on trajectories in the time window, with each tensor element as the affinity of the corresponding trajectory candidate. The local assignment variables are the ℓ1 normalized vectors, which are used to approximate the rank-1 tensor. Our approach provides a flexible and effective formulation where both pairwise and high-order association energies can be used expediently. We also show the close relation between our formulation and the multi-dimensional assignment (MDA) model. To solve the optimization in the rank-1 tensor approximation, we propose an algorithm that iteratively powers the intermediate solution followed by an ℓ1 normalization. Aside from effectively capturing high-order motion information, the proposed solver runs efficiently with proved convergence. The experimental validations are conducted on two challenging datasets and our method demonstrates promising performances on both.
european conference on computer vision | 2016
Yanghao Li; Cuiling Lan; Junliang Xing; Wenjun Zeng; Chunfeng Yuan; Jiaying Liu
Human action recognition from well-segmented 3D skeleton data has been intensively studied and has been attracting an increasing attention. Online action detection goes one step further and is more challenging, which identifies the action type and localizes the action positions on the fly from the untrimmed stream data. In this paper, we study the problem of online action detection from streaming skeleton data. We propose a multi-task end-to-end Joint Classification-Regression Recurrent Neural Network to better explore the action type and temporal localization information. By employing a joint classification and regression optimization objective, this network is capable of automatically localizing the start and end points of actions more accurately. Specifically, by leveraging the merits of the deep Long Short-Term Memory (LSTM) subnetwork, the proposed model automatically captures the complex long-range temporal dynamics, which naturally avoids the typical sliding window design and thus ensures high computational efficiency. Furthermore, the subtask of regression optimization provides the ability to forecast the action prior to its occurrence. To evaluate our proposed model, we build a large streaming video dataset with annotations. Experimental results on our dataset and the public G3D dataset both demonstrate very promising performance of our scheme.