Is this you? Create Your Porfile

Jingkuan Song

University of Electronic Science and Technology of China

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jingkuan Song is active.

Explore More

Publication

Featured researches published by Jingkuan Song.

acm multimedia | 2011

Multiple feature hashing for real-time large scale near-duplicate video retrieval

Jingkuan Song; Yi Yang; Zi Huang; Heng Tao Shen

Near-duplicate video retrieval (NDVR) has recently attracted lots of research attention due to the exponential growth of online videos. It helps in many areas, such as copyright protection, video tagging, online video usage monitoring, etc. Most of existing approaches use only a single feature to represent a video for NDVR. However, a single feature is often insufficient to characterize the video content. Besides, while the accuracy is the main concern in previous literatures, the scalability of NDVR algorithms for large scale video datasets has been rarely addressed. In this paper, we present a novel approach - Multiple Feature Hashing (MFH) to tackle both the accuracy and the scalability issues of NDVR. MFH preserves the local structure information of each individual feature and also globally consider the local structures for all the features to learn a group of hash functions which map the video keyframes into the Hamming space and generate a series of binary codes to represent the video dataset. We evaluate our approach on a public video dataset and a large scale video dataset consisting of 132,647 videos, which was collected from YouTube by ourselves. The experiment results show that the proposed method outperforms the state-of-the-art techniques in both accuracy and efficiency.

IEEE Transactions on Multimedia | 2013

Multi-Feature Fusion via Hierarchical Regression for Multimedia Analysis

Yi Yang; Jingkuan Song; Zi Huang; Zhigang Ma; Nicu Sebe; Alexander G. Hauptmann

Multimedia data are usually represented by multiple features. In this paper, we propose a new algorithm, namely Multi-feature Learning via Hierarchical Regression for multimedia semantics understanding, where two issues are considered. First, labeling large amount of training data is labor-intensive. It is meaningful to effectively leverage unlabeled data to facilitate multimedia semantics understanding. Second, given that multimedia data can be represented by multiple features, it is advantageous to develop an algorithm which combines evidence obtained from different features to infer reliable multimedia semantic concept classifiers. We design a hierarchical regression model to exploit the information derived from each type of feature, which is then collaboratively fused to obtain a multimedia semantic concept classifier. Both label information and data distribution of different features representing multimedia data are considered. The algorithm can be applied to a wide range of multimedia applications and experiments are conducted on video data for video concept annotation and action recognition. Using Trecvid and CareMedia video datasets, the experimental results show that it is beneficial to combine multiple features. The performance of the proposed algorithm is remarkable when only a small amount of labeled training data are available.

IEEE Transactions on Multimedia | 2013

Effective Multiple Feature Hashing for Large-Scale Near-Duplicate Video Retrieval

Jingkuan Song; Yi Yang; Zi Huang; Heng Tao Shen; Jiebo Luo

Near-duplicate video retrieval (NDVR) has recently attracted much research attention due to the exponential growth of online videos. It has many applications, such as copyright protection, automatic video tagging and online video monitoring. Many existing approaches use only a single feature to represent a video for NDVR. However, a single feature is often insufficient to characterize the video content. Moreover, while the accuracy is the main concern in previous literatures, the scalability of NDVR algorithms for large scale video datasets has been rarely addressed. In this paper, we present a novel approach-Multiple Feature Hashing (MFH) to tackle both the accuracy and the scalability issues of NDVR. MFH preserves the local structural information of each individual feature and also globally considers the local structures for all the features to learn a group of hash functions to map the video keyframes into the Hamming space and generate a series of binary codes to represent the video dataset. We evaluate our approach on a public video dataset and a large scale video dataset consisting of 132,647 videos collected from YouTube by ourselves. This dataset has been released (http://itee.uq.edu.au/shenht/UQ_VIDEO/). The experimental results show that the proposed method outperforms the state-of-the-art techniques in both accuracy and efficiency.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2018

A Survey on Learning to Hash

Jingdong Wang; Ting Zhang; Jingkuan Song; Nicu Sebe; Heng Tao Shen

Nearest neighbor search is a problem of finding the data points from the database such that the distances from them to the query point are the smallest. Learning to hash is one of the major solutions to this problem and has been widely studied recently. In this paper, we present a comprehensive survey of the learning to hash algorithms, categorize them according to the manners of preserving the similarities into: pairwise similarity preserving, multiwise similarity preserving, implicit similarity preserving, as well as quantization, and discuss their relations. We separate quantization from pairwise similarity preserving as the objective function is very different though quantization, as we show, can be derived from preserving the pairwise similarities. In addition, we present the evaluation protocols, and the general performance analysis, and point out that the quantization algorithms perform superiorly in terms of search accuracy, search time cost, and space cost. Finally, we introduce a few emerging topics.

british machine vision conference | 2015

Learning Deep Representations of Appearance and Motion for Anomalous Event Detection

Dan Xu; Elisa Ricci; Yan Yan; Jingkuan Song; Nicu Sebe

We present a novel unsupervised deep learning framework for anomalous event detection in complex video scenes. While most existing works merely use hand-crafted appearance and motion features, we propose Appearance and Motion DeepNet (AMDN) which utilizes deep neural networks to automatically learn feature representations. To exploit the complementary information of both appearance and motion patterns, we introduce a novel double fusion framework, combining both the benefits of traditional early fusion and late fusion strategies. Specifically, stacked denoising autoencoders are proposed to separately learn both appearance and motion features as well as a joint representation (early fusion). Based on the learned representations, multiple one-class SVM models are used to predict the anomaly scores of each input, which are then integrated with a late fusion strategy for final anomaly detection. We evaluate the proposed method on two publicly available video surveillance datasets, showing competitive performance with respect to state of the art approaches.

Neurocomputing | 2017

Graph self-representation method for unsupervised feature selection

Rongyao Hu; Xiaofeng Zhu; Debo Cheng; Wei He; Yan Yan; Jingkuan Song; Shichao Zhang

Abstract Both subspace learning methods and feature selection methods are often used for removing irrelative features from high-dimensional data. Studies have shown that feature selection methods have interpretation ability and subspace learning methods output stable performance. This paper proposes a new unsupervised feature selection by integrating a subspace learning method (i.e., Locality Preserving Projection (LPP)) into a new feature selection method (i.e., a sparse feature-level self-representation method), aim at simultaneously receiving stable performance and interpretation ability. Different from traditional sample-level self-representation where each sample is represented by all samples and has been popularly used in machine learning and computer vision. In this paper, we propose to represent each feature by its relevant features to conduct feature selection via devising a feature-level self-representation loss function plus an l 2 , 1 -norm regularization term. Then we add a graph regularization term (i.e., LPP) into the resulting feature selection model to simultaneously conduct feature selection and subspace learning. The rationale of the LPP regularization term is that LPP preserves the original distribution of data after removing irrelative features. Finally, we conducted experiments on UCI data sets and other real data sets and the experimental results showed that the proposed approach outperformed all comparison algorithms.

computer vision and pattern recognition | 2015

Optimal graph learning with partial tags and multiple features for image and video annotation

Lianli Gao; Jingkuan Song; Feiping Nie; Yan Yan; Nicu Sebe; Heng Tao Shen

In multimedia annotation, due to the time constraints and the tediousness of manual tagging, it is quite common to utilize both tagged and untagged data to improve the performance of supervised learning when only limited tagged training data are available. This is often done by adding a geometrically based regularization term in the objective function of a supervised learning model. In this case, a similarity graph is indispensable to exploit the geometrical relationships among the training data points, and the graph construction scheme essentially determines the performance of these graph-based learning algorithms. However, most of the existing works construct the graph empirically and are usually based on a single feature without using the label information. In this paper, we propose a semi-supervised annotation approach by learning an optimal graph (OGL) from multi-cues (i.e., partial tags and multiple features) which can more accurately embed the relationships among the data points. We further extend our model to address out-of-sample and noisy label issues. Extensive experiments on four public datasets show the consistent superiority of OGL over state-of-the-art methods by up to 12% in terms of mean average precision.

IEEE Transactions on Systems, Man, and Cybernetics | 2014

Robust Hashing With Local Models for Approximate Similarity Search

Jingkuan Song; Yi Yang; Xuelong Li; Zi Huang; Yang Yang

Similarity search plays an important role in many applications involving high-dimensional data. Due to the known dimensionality curse, the performance of most existing indexing structures degrades quickly as the feature dimensionality increases. Hashing methods, such as locality sensitive hashing (LSH) and its variants, have been widely used to achieve fast approximate similarity search by trading search quality for efficiency. However, most existing hashing methods make use of randomized algorithms to generate hash codes without considering the specific structural information in the data. In this paper, we propose a novel hashing method, namely, robust hashing with local models (RHLM), which learns a set of robust hash functions to map the high-dimensional data points into binary hash codes by effectively utilizing local structural information. In RHLM, for each individual data point in the training dataset, a local hashing model is learned and used to predict the hash codes of its neighboring data points. The local models from all the data points are globally aligned so that an optimal hash code can be assigned to each data point. After obtaining the hash codes of all the training data points, we design a robust method by employing ℓ2,1-norm minimization on the loss function to learn effective hash functions, which are then used to map each database point into its hash code. Given a query data point, the search process first maps it into the query hash code by the hash functions and then explores the buckets, which have similar hash codes to the query hash code. Extensive experimental results conducted on real-life datasets show that the proposed RHLM outperforms the state-of-the-art methods in terms of search quality and efficiency.

IEEE Transactions on Image Processing | 2016

A Fast Optimization Method for General Binary Code Learning

Fumin Shen; Xiang Zhou; Yang Yang; Jingkuan Song; Heng Tao Shen; Dacheng Tao

Hashing or binary code learning has been recognized to accomplish efficient near neighbor search, and has thus attracted broad interests in recent retrieval, vision, and learning studies. One main challenge of learning to hash arises from the involvement of discrete variables in binary code optimization. While the widely used continuous relaxation may achieve high learning efficiency, the pursued codes are typically less effective due to accumulated quantization error. In this paper, we propose a novel binary code optimization method, dubbed discrete proximal linearized minimization (DPLM), which directly handles the discrete constraints during the learning process. Specifically, the discrete (thus nonsmooth nonconvex) problem is reformulated as minimizing the sum of a smooth loss term with a nonsmooth indicator function. The obtained problem is then efficiently solved by an iterative procedure with each iteration admitting an analytical discrete solution, which is thus shown to converge very fast. In addition, the proposed method supports a large family of empirical loss functions, which is particularly instantiated in this paper by both a supervised and an unsupervised hashing losses, together with the bits uncorrelation and balance constraints. In particular, the proposed DPLM with a supervised ℓ2 loss encodes the whole NUS-WIDE database into 64-b binary codes within 10 s on a standard desktop computer. The proposed approach is extensively evaluated on several large-scale data sets and the generated binary codes are shown to achieve very promising results on both retrieval and classification tasks.Hashing or binary code learning has been recognized to accomplish efficient near neighbor search, and has thus attracted broad interests in recent retrieval, vision, and learning studies. One main challenge of learning to hash arises from the involvement of discrete variables in binary code optimization. While the widely used continuous relaxation may achieve high learning efficiency, the pursued codes are typically less effective due to accumulated quantization error. In this paper, we propose a novel binary code optimization method, dubbed discrete proximal linearized minimization (DPLM), which directly handles the discrete constraints during the learning process. Specifically, the discrete (thus nonsmooth nonconvex) problem is reformulated as minimizing the sum of a smooth loss term with a nonsmooth indicator function. The obtained problem is then efficiently solved by an iterative procedure with each iteration admitting an analytical discrete solution, which is thus shown to converge very fast. In addition, the proposed method supports a large family of empirical loss functions, which is particularly instantiated in this paper by both a supervised and an unsupervised hashing losses, together with the bits uncorrelation and balance constraints. In particular, the proposed DPLM with a supervised ℓ2 loss encodes the whole NUS-WIDE database into 64-b binary codes within 10 s on a standard desktop computer. The proposed approach is extensively evaluated on several large-scale data sets and the generated binary codes are shown to achieve very promising results on both retrieval and classification tasks.

IEEE Transactions on Image Processing | 2016

Optimized Graph Learning Using Partial Tags and Multiple Features for Image and Video Annotation

Jingkuan Song; Lianli Gao; Feiping Nie; Heng Tao Shen; Yan Yan; Nicu Sebe

In multimedia annotation, due to the time constraints and the tediousness of manual tagging, it is quite common to utilize both tagged and untagged data to improve the performance of supervised learning when only limited tagged training data are available. This is often done by adding a geometry-based regularization term in the objective function of a supervised learning model. In this case, a similarity graph is indispensable to exploit the geometrical relationships among the training data points, and the graph construction scheme essentially determines the performance of these graph-based learning algorithms. However, most of the existing works construct the graph empirically and are usually based on a single feature without using the label information. In this paper, we propose a semi-supervised annotation approach by learning an optimized graph (OGL) from multi-cues (i.e., partial tags and multiple features), which can more accurately embed the relationships among the data points. Since OGL is a transductive method and cannot deal with novel data points, we further extend our model to address the out-of-sample issue. Extensive experiments on image and video annotation show the consistent superiority of OGL over the state-of-the-art methods.

Explore More