Xiaojun Chang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiaojun Chang is active.

Explore More

Publication

Featured researches published by Xiaojun Chang.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2017

Semantic Pooling for Complex Event Analysis in Untrimmed Videos

Xiaojun Chang; Yaoliang Yu; Yi Yang; Eric P. Xing

Pooling plays an important role in generating a discriminative video representation. In this paper, we propose a new semantic pooling approach for challenging event analysis tasks (e.g., event detection, recognition, and recounting) in long untrimmed Internet videos, especially when only a few shots/segments are relevant to the event of interest while many other shots are irrelevant or even misleading. The commonly adopted pooling strategies aggregate the shots indifferently in one way or another, resulting in a great loss of information. Instead, in this work we first define a novel notion of semantic saliency that assesses the relevance of each shot with the event of interest. We then prioritize the shots according to their saliency scores since shots that are semantically more salient are expected to contribute more to the final event analysis. Next, we propose a new isotonic regularizer that is able to exploit the constructed semantic ordering information. The resulting nearly-isotonic support vector machine classifier exhibits higher discriminative power in event analysis tasks. Computationally, we develop an efficient implementation using the proximal gradient algorithm, and we prove new and closed-form proximal steps. We conduct extensive experiments on three real-world video datasets and achieve promising improvements.

IEEE Transactions on Systems, Man, and Cybernetics | 2017

Bi-Level Semantic Representation Analysis for Multimedia Event Detection

Xiaojun Chang; Zhigang Ma; Yi Yang; Zhiqiang Zeng; Alexander G. Hauptmann

Multimedia event detection has been one of the major endeavors in video event analysis. A variety of approaches have been proposed recently to tackle this problem. Among others, using semantic representation has been accredited for its promising performance and desirable ability for human-understandable reasoning. To generate semantic representation, we usually utilize several external image/video archives and apply the concept detectors trained on them to the event videos. Due to the intrinsic difference of these archives, the resulted representation is presumable to have different predicting capabilities for a certain event. Notwithstanding, not much work is available for assessing the efficacy of semantic representation from the source-level. On the other hand, it is plausible to perceive that some concepts are noisy for detecting a specific event. Motivated by these two shortcomings, we propose a bi-level semantic representation analyzing method. Regarding source-level, our method learns weights of semantic representation attained from different multimedia archives. Meanwhile, it restrains the negative influence of noisy or irrelevant concepts in the overall concept-level. In addition, we particularly focus on efficient multimedia event detection with few positive examples, which is highly appreciated in the real-world scenario. We perform extensive experiments on the challenging TRECVID MED 2013 and 2014 datasets with encouraging results that validate the efficacy of our proposed approach.

IEEE Transactions on Neural Networks | 2017

Semisupervised Feature Analysis by Mining Correlations Among Multiple Tasks

Xiaojun Chang; Yi Yang

In this paper, we propose a novel semisupervised feature selection framework by mining correlations among multiple tasks and apply it to different multimedia applications. Instead of independently computing the importance of features for each task, our algorithm leverages shared knowledge from multiple related tasks, thus improving the performance of feature selection. Note that the proposed algorithm is built upon an assumption that different tasks share some common structures. The proposed algorithm selects features in a batch mode, by which the correlations between various features are taken into consideration. Besides, considering the fact that labeling a large amount of training data in real world is both time-consuming and tedious, we adopt manifold learning, which exploits both labeled and unlabeled training data for a feature space analysis. Since the objective function is nonsmooth and difficult to solve, we propose an iteractive algorithm with fast convergence. Extensive experiments on different applications demonstrate that our algorithm outperforms the other state-of-the-art feature selection algorithms.

International Journal of Computer Vision | 2015

Multi-Class Active Learning by Uncertainty Sampling with Diversity Maximization

Yi Yang; Zhigang Ma; Feiping Nie; Xiaojun Chang; Alexander G. Hauptmann

As a way to relieve the tedious work of manual annotation, active learning plays important roles in many applications of visual concept recognition. In typical active learning scenarios, the number of labelled data in the seed set is usually small. However, most existing active learning algorithms only exploit the labelled data, which often suffers from over-fitting due to the small number of labelled examples. Besides, while much progress has been made in binary class active learning, little research attention has been focused on multi-class active learning. In this paper, we propose a semi-supervised batch mode multi-class active learning algorithm for visual concept recognition. Our algorithm exploits the whole active pool to evaluate the uncertainty of the data. Considering that uncertain data are always similar to each other, we propose to make the selected data as diverse as possible, for which we explicitly impose a diversity constraint on the objective function. As a multi-class active learning algorithm, our algorithm is able to exploit uncertainty across multiple classes. An efficient algorithm is used to optimize the objective function. Extensive experiments on action recognition, object classification, scene recognition, and event detection demonstrate its advantages.

IEEE Transactions on Image Processing | 2017

Feature Interaction Augmented Sparse Learning for Fast Kinect Motion Detection

Xiaojun Chang; Zhigang Ma; Ming Lin; Yi Yang; Alexander G. Hauptmann

The Kinect sensing devices have been widely used in current Human-Computer Interaction entertainment. A fundamental issue involved is to detect users’ motions accurately and quickly. In this paper, we tackle it by proposing a linear algorithm, which is augmented by feature interaction. The linear property guarantees its speed whereas feature interaction captures the higher order effect from the data to enhance its accuracy. The Schatten-p norm is leveraged to integrate the main linear effect and the higher order nonlinear effect by mining the correlation between them. The resulted classification model is a desirable combination of speed and accuracy. We propose a novel solution to solve our objective function. Experiments are performed on three public Kinect-based entertainment data sets related to fitness and gaming. The results show that our method has its advantage for motion detection in a real-time Kinect entertaining environment.

IEEE Transactions on Image Processing | 2017

Revealing Event Saliency in Unconstrained Video Collection

Dingwen Zhang; Junwei Han; Lu Jiang; Senmao Ye; Xiaojun Chang

Recent progresses in multimedia event detection have enabled us to find videos about a predefined event from a large-scale video collection. Research towards more intrinsic unsupervised video understanding is an interesting but understudied field. Specifically, given a collection of videos sharing a common event of interest, the goal is to discover the salient fragments, i.e., the curt video fragments that can concisely portray the underlying event of interest, from each video. To explore this novel direction, this paper proposes an unsupervised event saliency revealing framework. It first extracts features from multiple modalities to represent each shot in the given video collection. Then, these shots are clustered to build the cluster-level event saliency revealing framework, which explores useful information cues (i.e., the intra-cluster prior, inter-cluster discriminability, and inter-cluster smoothness) by a concise optimization model. Compared with the existing methods, our approach could highlight the intrinsic stimulus of the unseen event within a video in an unsupervised fashion. Thus, it could potentially benefit to a wide range of multimedia tasks like video browsing, understanding, and search. To quantitatively verify the proposed method, we systematically compare the method to a number of baseline methods on the TRECVID benchmarks. Experimental results have demonstrated its effectiveness and efficiency.

acm multimedia | 2015

Searching Persuasively: Joint Event Detection and Evidence Recounting with Limited Supervision

Xiaojun Chang; Yaoliang Yu; Yi Yang; Alexander G. Hauptmann

Multimedia event detection (MED) and multimedia event recounting (MER) are fundamental tasks in managing large amounts of unconstrained web videos, and have attracted a lot of attention in recent years. Most existing systems perform MER as a post-processing step on top of the MED results. In order to leverage the mutual benefits of the two tasks, we propose a joint framework that simultaneously detects high-level events and localizes the indicative concepts of the events. Our premise is that a good recounting algorithm should not only explain the detection result, but should also be able to assist detection in the first place. Coupled in a joint optimization framework, recounting improves detection by pruning irrelevant noisy concepts while detection directs recounting to the most discriminative evidences. To better utilize the powerful and interpretable semantic video representation, we segment each video into several shots and exploit the rich temporal structures at shot level. The consequent computational challenge is carefully addressed through a significant improvement of the current ADMM algorithm, which, after eliminating all inner loops and equipping novel closed-form solutions for all intermediate steps, enables us to efficiently process extremely large video corpora. We test the proposed method on the large scale TRECVID MEDTest 2014 and MEDTest 2013 datasets, and obtain very promising results for both MED and MER.

IEEE Transactions on Knowledge and Data Engineering | 2017

Beyond Trace Ratio: Weighted Harmonic Mean of Trace Ratios for Multiclass Discriminant Analysis

Zhihui Li; Feiping Nie; Xiaojun Chang; Yi Yang

Linear discriminant analysis (LDA) is one of the most important supervised linear dimensional reduction techniques which seeks to learn low-dimensional representation from the original high-dimensional feature space through a transformation matrix, while preserving the discriminative information via maximizing the between-class scatter matrix and minimizing the within class scatter matrix. However, the conventional LDA is formulated to maximize the arithmetic mean of trace ratios which suffers from the domination of the largest objectives and might deteriorate the recognition accuracy in practical applications with a large number of classes. In this paper, we propose a new criterion to maximize the weighted harmonic mean of trace ratios, which effectively avoid the domination problem while did not raise any difficulties in the formulation. An efficient algorithm is exploited to solve the proposed challenging problems with fast convergence, which might always find the globally optimal solution just using eigenvalue decomposition in each iteration. Finally, we conduct extensive experiments to illustrate the effectiveness and superiority of our method over both of synthetic datasets and real-life datasets for various tasks, including face recognition, human motion recognition and head pose recognition. The experimental results indicate that our algorithm consistently outperforms other compared methods on all of the datasets.

ACM Transactions on Knowledge Discovery From Data | 2016

Convex Sparse PCA for Unsupervised Feature Learning

Xiaojun Chang; Feiping Nie; Yi Yang; Chengqi Zhang; Heng Huang

Principal component analysis (PCA) has been widely applied to dimensionality reduction and data pre-processing for different applications in engineering, biology, social science, and the like. Classical PCA and its variants seek for linear projections of the original variables to obtain the low-dimensional feature representations with maximal variance. One limitation is that it is difficult to interpret the results of PCA. Besides, the classical PCA is vulnerable to certain noisy data. In this paper, we propose a Convex Sparse Principal Component Analysis (CSPCA) algorithm and apply it to feature learning. First, we show that PCA can be formulated as a low-rank regression optimization problem. Based on the discussion, the l2, 1-normminimization is incorporated into the objective function to make the regression coefficients sparse, thereby robust to the outliers. Also, based on the sparse model used in CSPCA, an optimal weight is assigned to each of the original feature, which in turn provides the output with good interpretability. With the output of our CSPCA, we can effectively analyze the importance of each feature under the PCA criteria. Our new objective function is convex, and we propose an iterative algorithm to optimize it. We apply the CSPCA algorithm to feature selection and conduct extensive experiments on seven benchmark datasets. Experimental results demonstrate that the proposed algorithm outperforms state-of-the-art unsupervised feature selection algorithms.

IEEE Transactions on Multimedia | 2017

The Many Shades of Negativity

Zhigang Ma; Xiaojun Chang; Yi Yang; Nicu Sebe; Alexander G. Hauptmann

Complex event detection has been progressively researched in recent years for the broad interest of video indexing and retrieval. To fulfill the purpose of event detection, one needs to train a classifier using both positive and negative examples. Current classifier training treats the negative videos as equally negative. However, we notice that many negative videos resemble the positive videos in different degrees. Intuitively, we may capture more informative cues from the negative videos if we assign them fine-grained labels, thus benefiting the classifier learning. Aiming for this, we use a statistical method on both the positive and negative examples to get the decisive attributes of a specific event. Based on these decisive attributes, we assign the fine-grained labels to negative examples to treat them differently for more effective exploitation. The resulting fine-grained labels may be not optimal to capture the discriminative cues from the negative videos. Hence, we propose to jointly optimize the fine-grained labels with the classifier learning, which brings mutual reciprocality. Meanwhile, the labels of positive examples are supposed to remain unchanged. We thus additionally introduce a constraint for this purpose. On the other hand, the state-of-the-art deep convolutional neural network features are leveraged in our approach for event detection to further boost the performance. Extensive experiments on the challenging TRECVID MED 2014 dataset have validated the efficacy of our proposed approach.

Explore More