Jianke Zhu
Zhejiang University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jianke Zhu.
european conference on computer vision | 2016
Matej Kristan; Roman P. Pflugfelder; Aleš Leonardis; Jiri Matas; Luka Cehovin; Georg Nebehay; Tomas Vojir; Gustavo Fernández; Alan Lukezic; Aleksandar Dimitriev; Alfredo Petrosino; Amir Saffari; Bo Li; Bohyung Han; CherKeng Heng; Christophe Garcia; Dominik Pangersic; Gustav Häger; Fahad Shahbaz Khan; Franci Oven; Horst Bischof; Hyeonseob Nam; Jianke Zhu; Jijia Li; Jin Young Choi; Jin-Woo Choi; João F. Henriques; Joost van de Weijer; Jorge Batista; Karel Lebeda
Visual tracking has attracted a significant attention in the last few decades. The recent surge in the number of publications on tracking-related problems have made it almost impossible to follow the developments in the field. One of the reasons is that there is a lack of commonly accepted annotated data-sets and standardized evaluation protocols that would allow objective comparison of different tracking methods. To address this issue, the Visual Object Tracking (VOT) workshop was organized in conjunction with ICCV2013. Researchers from academia as well as industry were invited to participate in the first VOT2013 challenge which aimed at single-object visual trackers that do not apply pre-learned models of object appearance (model-free). Presented here is the VOT2013 benchmark dataset for evaluation of single-object visual trackers as well as the results obtained by the trackers competing in the challenge. In contrast to related attempts in tracker benchmarking, the dataset is labeled per-frame by visual attributes that indicate occlusion, illumination change, motion change, size change and camera motion, offering a more systematic comparison of the trackers. Furthermore, we have designed an automated system for performing and evaluating the experiments. We present the evaluation protocol of the VOT2013 challenge and the results of a comparison of 27 trackers on the benchmark dataset. The dataset, the evaluation tools and the tracker rankings are publicly available from the challenge website (http://votchallenge.net).
european conference on computer vision | 2014
Yang Li; Jianke Zhu
Although the correlation filter-based trackers achieve the competitive results both on accuracy and robustness, there is still a need to improve the overall tracking capability. In this paper, we presented a very appealing tracker based on the correlation filter framework. To tackle the problem of the fixed template size in kernel correlation filter tracker, we suggest an effective scale adaptive scheme. Moreover, the powerful features including HoG and color-naming are integrated together to further boost the overall tracking performance. The extensive empirical evaluations on the benchmark videos and VOT 2014 dataset demonstrate that the proposed tracker is very promising for the various challenging scenarios. Our method successfully tracked the targets in about 72% videos and outperformed the state-of-the-art trackers on the benchmark dataset with 51 sequences.
computer vision and pattern recognition | 2008
Steven C. H. Hoi; Rong Jin; Jianke Zhu; Michael R. Lyu
Active learning has been shown as a key technique for improving content-based image retrieval (CBIR) performance. Among various methods, support vector machine (SVM) active learning is popular for its application to relevance feedback in CBIR. However, the regular SVM active learning has two main drawbacks when used for relevance feedback. First, SVM often suffers from learning with a small number of labeled examples, which is the case in relevance feedback. Second, SVM active learning usually does not take into account the redundancy among examples, and therefore could select multiple examples in relevance feedback that are similar (or even identical) to each other. In this paper, we propose a novel scheme that exploits both semi-supervised kernel learning and batch mode active learning for relevance feedback in CBIR. In particular, a kernel function is first learned from a mixture of labeled and unlabeled examples. The kernel will then be used to effectively identify the informative and diverse examples for active learning via a min-max framework. An empirical study with relevance feedback of CBIR showed that the proposed scheme is significantly more effective than other state-of-the-art approaches.
computer vision and pattern recognition | 2015
Yang Li; Jianke Zhu; Steven C. H. Hoi
Most modern trackers typically employ a bounding box given in the first frame to track visual objects, where their tracking results are often sensitive to the initialization. In this paper, we propose a new tracking method, Reliable Patch Trackers (RPT), which attempts to identify and exploit the reliable patches that can be tracked effectively through the whole tracking process. Specifically, we present a tracking reliability metric to measure how reliably a patch can be tracked, where a probability model is proposed to estimate the distribution of reliable patches under a sequential Monte Carlo framework. As the reliable patches distributed over the image, we exploit the motion trajectories to distinguish them from the background. Therefore, the visual object can be defined as the clustering of homo-trajectory patches, where a Hough voting-like scheme is employed to estimate the target state. Encouraging experimental results on a large set of sequences showed that the proposed approach is very effective and in comparison to the state-of-the-art trackers. The full source code of our implementation will be publicly available.
IEEE Transactions on Systems, Man, and Cybernetics | 2014
Luming Zhang; Yue Gao; Chaoqun Hong; Yinfu Feng; Jianke Zhu; Deng Cai
In computer vision and multimedia analysis, it is common to use multiple features (or multimodal features) to represent an object. For example, to well characterize a natural scene image, we typically extract a set of visual features to represent its color, texture, and shape. However, it is challenging to integrate multimodal features optimally. Since they are usually high-order correlated, e.g., the histogram of gradient (HOG), bag of scale invariant feature transform descriptors, and wavelets are closely related because they collaboratively reflect the image texture. Nevertheless, the existing algorithms fail to capture the high-order correlation among multimodal features. To solve this problem, we present a new multimodal feature integration framework. Particularly, we first define a new measure to capture the high-order correlation among the multimodal features, which can be deemed as a direct extension of the previous binary correlation. Therefore, we construct a feature correlation hypergraph (FCH) to model the high-order relations among multimodal features. Finally, a clustering algorithm is performed on FCH to group the original multimodal features into a set of partitions. Moreover, a multiclass boosting strategy is developed to obtain a strong classifier by combining the weak classifiers learned from each partition. The experimental results on seven popular datasets show the effectiveness of our approach.
IEEE Transactions on Multimedia | 2010
Hao Ma; Jianke Zhu; Michael R. Lyu; Irwin King
With the exponential growth of Web 2.0 applications, tags have been used extensively to describe the image contents on the Web. Due to the noisy and sparse nature in the human generated tags, how to understand and utilize these tags for image retrieval tasks has become an emerging research direction. As the low-level visual features can provide fruitful information, they are employed to improve the image retrieval results. However, it is challenging to bridge the semantic gap between image contents and tags. To attack this critical problem, we propose a unified framework in this paper which stems from a two-level data fusions between the image contents and tags: 1) A unified graph is built to fuse the visual feature-based image similarity graph with the image-tag bipartite graph; 2) A novel random walk model is then proposed, which utilizes a fusion parameter to balance the influences between the image contents and tags. Furthermore, the presented framework not only can naturally incorporate the pseudo relevance feedback process, but also it can be directly applied to applications such as content-based image retrieval, text-based image retrieval, and image annotation. Experimental analysis on a large Flickr dataset shows the effectiveness and efficiency of our proposed framework.
ACM Transactions on Information Systems | 2009
Steven C. H. Hoi; Rong Jin; Jianke Zhu; Michael R. Lyu
Support vector machine (SVM) active learning is one popular and successful technique for relevance feedback in content-based image retrieval (CBIR). Despite the success, conventional SVM active learning has two main drawbacks. First, the performance of SVM is usually limited by the number of labeled examples. It often suffers a poor performance for the small-sized labeled examples, which is the case in relevance feedback. Second, conventional approaches do not take into account the redundancy among examples, and could select multiple examples that are similar (or even identical). In this work, we propose a novel scheme for explicitly addressing the drawbacks. It first learns a kernel function from a mixture of labeled and unlabeled data, and therefore alleviates the problem of small-sized training data. The kernel will then be used for a batch mode active learning method to identify the most informative and diverse examples via a min-max framework. Two novel algorithms are proposed to solve the related combinatorial optimization: the first approach approximates the problem into a quadratic program, and the second solves the combinatorial optimization approximately by a greedy algorithm that exploits the merits of submodular functions. Extensive experiments with image retrieval using both natural photo images and medical images show that the proposed algorithms are significantly more effective than the state-of-the-art approaches. A demo is available at http://msm.cais.ntu.edu.sg/LSCBIR/.
acm multimedia | 2008
Jianke Zhu; Steven C. H. Hoi; Michael R. Lyu; Shuicheng Yan
Near-duplicate image retrieval plays an important role in many real-world multimedia applications. Most previous approaches have some limitations. For example, conventional appearance-based methods may suffer from the illumination variations and occlusion issue, and local feature correspondence-based methods often do not consider local deformations and the spatial coherence between two point sets. In this paper, we propose a novel and effective Nonrigid Image Matching (NIM) approach to tackle the task of near-duplicate keyframe retrieval from real-world video corpora. In contrast to previous approaches, the NIM technique can recover an explicit mapping between two near-duplicate images with a few deformation parameters and find out the correct correspondences from noisy data effectively. To make our technique applicable to large-scale applications, we suggest an effective multi-level ranking scheme that filters out the irrelevant results in a coarse-to-fine manner. In our ranking scheme, to overcome the extremely small training size challenge, we employ a semi-supervised learning method for improving the performance using unlabeled data. To evaluate the effectiveness of our solution, we have conducted extensive experiments on two benchmark testbeds extracted from the TRECVID2003 and TRECVID2004 corpora. The promising results show that our proposed method is more effective than other state-of-the-art approaches for near-duplicate keyframe retrieval.
IEEE Transactions on Knowledge and Data Engineering | 2013
Chenxia Wu; Jianke Zhu; Deng Cai; Chun Chen; Jiajun Bu
In this paper, we study the effective semi-supervised hashing method under the framework of regularized learning-based hashing. A nonlinear hash function is introduced to capture the underlying relationship among data points. Thus, the dimensionality of the matrix for computation is not only independent from the dimensionality of the original data space but also much smaller than the one using linear hash function. To effectively deal with the error accumulated during converting the real-value embeddings into the binary code after relaxation, we propose a semi-supervised nonlinear hashing algorithm using bootstrap sequential projection learning which effectively corrects the errors by taking into account of all the previous learned bits holistically without incurring the extra computational overhead. Experimental results on the six benchmark data sets demonstrate that the presented method outperforms the state-of-the-art hashing algorithms at a large margin.
IEEE Transactions on Multimedia | 2008
Jianke Zhu; Steven C. H. Hoi; Michael R. Lyu
Face annotation in images and videos enjoys many potential applications in multimedia information retrieval. Face annotation usually requires many training data labeled by hand in order to build effective classifiers. This is particularly challenging when annotating faces on large-scale collections of media data, in which huge labeling efforts would be very expensive. As a result, traditional supervised face annotation methods often suffer from insufficient training data. To attack this challenge, in this paper, we propose a novel Transductive Kernel Fisher Discriminant (TKFD) scheme for face annotation, which outperforms traditional supervised annotation methods with few training data. The main idea of our approach is to solve the Fishers discriminant using deformed kernels incorporating the information of both labeled and unlabeled data. To evaluate the effectiveness of our method, we have conducted extensive experiments on three types of multimedia testbeds: the FRGC benchmark face dataset, the Yahoo! web image collection, and the TRECVID video data collection. The experimental results show that our TKFD algorithm is more effective than traditional supervised approaches, especially when there are very few training data.