Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Cuicui Kang is active.

Publication


Featured researches published by Cuicui Kang.


IEEE Transactions on Multimedia | 2015

Learning Consistent Feature Representation for Cross-Modal Multimedia Retrieval

Cuicui Kang; Shiming Xiang; Shengcai Liao; Changsheng Xu; Chunhong Pan

The cross-modal feature matching has gained much attention in recent years, which has many practical applications, such as the text-to-image retrieval. The most difficult problem of cross-modal matching is how to eliminate the heterogeneity between modalities. The existing methods (e.g., CCA and PLS) try to learn a common latent subspace, where the heterogeneity between two modalities is minimized so that cross-matching is possible. However, most of these methods require fully paired samples and suffer difficulties when dealing with unpaired data. Besides, utilizing the class label information has been found as a good way to reduce the semantic gap between the low-level image features and high-level document descriptions. Considering this, we propose a novel and effective supervised algorithm, which can also deal with the unpaired data. In the proposed formulation, the basis matrices of different modalities are jointly learned based on the training samples. Moreover, a local group-based priori is proposed in the formulation to make a better use of popular block based features (e.g., HOG and GIST). Extensive experiments are conducted on four public databases: Pascal VOC2007, LabelMe, Wikipedia, and NUS-WIDE. We also evaluated the proposed algorithm with unpaired data. By comparing with existing state-of-the-art algorithms, the results show that the proposed algorithm is more robust and achieves the best performance, which outperforms the second best algorithm by about 5% on both the Pascal VOC2007 and NUS-WIDE databases.


international conference on image processing | 2011

Kernel sparse representation with local patterns for face recognition

Cuicui Kang; Shengcai Liao; Shiming Xiang; Chunhong Pan

In this paper we propose a novel kernel sparse representation classification (SRC) framework and utilize the local binary pattern (LBP) descriptor in this framework for robust face recognition. First we develop a kernel coordinate descent (KCD) algorithm for 11 minimization in the kernel space, which is based on the covariance update technique. Then we extract LBP descriptors from each image and apply two types of kernels (χ2 distance based and Hamming distance based) with the proposed KCD algorithm under the SRC framework for face recognition. Experiments on both the Extended Yale B and the PIE face databases show that the proposed method is more robust against noise, occlusion, and illumination variations, even with small number of training samples.


conference on information and knowledge management | 2015

Cross-Modal Similarity Learning: A Low Rank Bilinear Formulation

Cuicui Kang; Shengcai Liao; Yonghao He; Jian Wang; Wenjia Niu; Shiming Xiang; Chunhong Pan

The cross-media retrieval problem has received much attention in recent years due to the rapid increasing of multimedia data on the Internet. A new approach to the problem has been raised which intends to match features of different modalities directly. In this research, there are two critical issues: how to get rid of the heterogeneity between different modalities and how to match the cross-modal features of different dimensions. Recently metric learning methods show a good capability in learning a distance metric to explore the relationship between data points. However, the traditional metric learning algorithms only focus on single-modal features, which suffer difficulties in addressing the cross-modal features of different dimensions. In this paper, we propose a cross-modal similarity learning algorithm for the cross-modal feature matching. The proposed method takes a bilinear formulation, and with the nuclear-norm penalization, it achieves low-rank representation. Accordingly, the accelerated proximal gradient algorithm is successfully imported to find the optimal solution with a fast convergence rate O(1/t2). Experiments on three well known image-text cross-media retrieval databases show that the proposed method achieves the best performance compared to the state-of-the-art algorithms.


IEEE Transactions on Multimedia | 2016

Cross-Modal Retrieval via Deep and Bidirectional Representation Learning

Yonghao He; Shiming Xiang; Cuicui Kang; Jian Wang; Chunhong Pan

Cross-modal retrieval emphasizes understanding inter-modality semantic correlations, which is often achieved by designing a similarity function. Generally, one of the most important things considered by the similarity function is how to make the cross-modal similarity computable. In this paper, a deep and bidirectional representation learning model is proposed to address the issue of image-text cross-modal retrieval. Owing to the solid progress of deep learning in computer vision and natural language processing, it is reliable to extract semantic representations from both raw image and text data by using deep neural networks. Therefore, in the proposed model, two convolution-based networks are adopted to accomplish representation learning for images and texts. By passing the networks, images and texts are mapped to a common space, in which the cross-modal similarity is measured by cosine distance. Subsequently, a bidirectional network architecture is designed to capture the property of the cross-modal retrieval-the bidirectional search. Such architecture is characterized by simultaneously involving the matched and unmatched image-text pairs for training. Accordingly, a learning framework with maximum likelihood criterion is finally developed. The network parameters are optimized via backpropagation and stochastic gradient descent. A great deal of experiments are conducted to sufficiently evaluate the proposed method on three publicly released datasets: IAPRTC-12, Flickr30k, and Flickr8k. The overall results definitely show that the proposed architecture is effective and the learned representations have good semantics to achieve superior cross-modal retrieval performance.


international conference on multimedia retrieval | 2015

Image-Text Cross-Modal Retrieval via Modality-Specific Feature Learning

Jian Wang; Yonghao He; Cuicui Kang; Shiming Xiang; Chunhong Pan

Cross-modal retrieval extends the ability of search engines to deal with the massive cross-modal data. The goal of image-text cross-modal retrieval is to search images (texts) by using text (image) queries by computing the similarities of images and texts directly. Many existing methods rely on low-level visual features and textual features for cross-modal retrieval, ignoring the characteristics existing in the raw data of different modalities. In this paper, a novel model based on modality-specific feature learning is proposed. Considering the characteristics of different modalities, the model uses two types of convolutional neural networks to map the raw data to the latent space representations for images and texts, respectively. Particularly, the convolution based network used for texts involves word embedding learning, which has been proved effective to extract meaningful textual features for text classification. In the latent space, the mapped features of images and texts form relevant and irrelevant image-text pairs, which are used by the one-vs-more learning scheme. This learning scheme can achieve ranking functionality by allowing for one relevant and more irrelevant pairs. The standard back-propagation technique is employed to update the parameters of two convolutional networks. Extensive cross-modal retrieval experiments are carried out on three challenging datasets that consist of image-document pairs or image-query click-through data from a search engine, and the results firmly demonstrate that the proposed model is much more effective.


Neurocomputing | 2014

Kernel sparse representation with pixel-level and region-level local feature kernels for face recognition

Cuicui Kang; Shengcai Liao; Shiming Xiang; Chunhong Pan

Face recognition has been popular in the pattern recognition field for decades, but it is still a difficult problem due to the various image distortions. Recently, sparse representation based classification (SRC) was proposed as a novel image classification approach, which is very effective with sufficient training samples for each class. However, the performance drops when the number of training samples is limited. In this paper, we show that effective local image features and appropriate nonlinear kernels are needed in deriving a better classification method based on sparse representation. Thus, we propose a novel kernel SRC framework and utilize effective local image features in this framework for robust face recognition. First, we present a kernel coordinate descent (KCD) algorithm for the LASSO problem in the kernel space, and we successfully integrate it in the SRC framework (called KCD-SRC) for face recognition. Second, we employ local image features and develop both pixel-level and region-level kernels for KCD-SRC based face recognition, making it discriminative and robust against illumination variations and occlusions. Extensive experiments are conducted on three public face databases (Extended YaleB, CMU-PIE and AR) under illumination variations, noise corruptions, continuous occlusions, and registration errors, demonstrating excellent performances of the KCD-SRC algorithm combining with the proposed kernels.


international conference on multimedia retrieval | 2015

Large Scale Image Annotation via Deep Representation Learning and Tag Embedding Learning

Yonghao He; Jian Wang; Cuicui Kang; Shiming Xiang; Chunhong Pan

In this paper, we focus on the issue of large scale image annotation, whereas most existing methods are devised for small datasets. A novel model based on deep representation learning and tag embedding learning is proposed. Specifically, the proposed model learns an unified latent space for image visual features and tag embeddings simultaneously. Furthermore, a metric matrix is introduced to estimate the relevance scores between images and tags. Finally, an objective function modeling triplet relationships (irrelevant tag, image, relevant tag) is proposed with maximum margin pursuit. The proposed model is easy to tackle new images and tags via online learning and has a relatively low test computation complexity. Experimental results on NUS-WIDE dataset demonstrate the effectiveness of the proposed model.


Neurocomputing | 2015

Image tag-ranking via pairwise supervision based semi-supervised model

Yonghao He; Cuicui Kang; Jian Wang; Shiming Xiang; Chunhong Pan

Image tag-ranking, the task to sort tags based on their relevance to the related images, has become a hot topic in the field of multimedia. Most existing methods do not incorporate the tag-ranking order information into the models, which is actually very important to solve the issue of image tag-ranking. In this paper, by taking advantage of such important information, we propose a novel model which uses images with ranked tag lists as its supervision information. In the proposed method, each ranked tag list is decomposed into a number of image-tag pairs, all of which are pooled together for training a scoring function. With this pairwise supervision, the model is able to capture the intrinsic ranking structures. In addition, unsupervised data, namely images with unranked tag lists, is also integrated for digging the binary order: relevant or irrelevant. By leveraging both the pairwise supervision and unsupervised structural information, our model sufficiently exploits the tag relevance to images as well as the ranking structures of tag lists. Extensive experiments are conducted on both image tag-ranking and tag-based image search with three benchmark datasets: SUNAttribute, Labelme and MSRC, demonstrating the effectiveness of the proposed model.


acm multimedia | 2014

Cross Modal Deep Model and Gaussian Process Based Model for MSR-Bing Challenge

Jian Wang; Cuicui Kang; Yonghao He; Shiming Xiang; Chunhong Pan

In the MSR-Bing Image Retrieval Challenge, the contestants are required to design a system that can score the query-image pairs based on the relevance between queries and images. To address this problem, we propose a regression based cross modal deep learning model and a Gaussian Process scoring model. The regression based cross modal deep learning model takes the image features and query features as inputs respectively and outputs the relevance scores directly. The Gaussian Process scoring model regards the challenge as a ranking problem and utilizes the click (or pseudo click) information from both the training set and the development set to predict the relevance scores. The proposed models are used in different situations: matched and miss-matched queries. Experiments on the development set show the effectiveness of the proposed models.


computer vision and pattern recognition | 2013

Local Sparse Discriminant Analysis for Robust Face Recognition

Cuicui Kang; Shengcai Liao; Shiming Xiang; Chunhong Pan

The Linear Discriminant Analysis (LDA) algorithm plays an important role in pattern recognition. A common practice is that LDA and many of its variants generally learn dense bases, which are not robust to local image distortions and partial occlusions. Recently, the LASSO penalty has been incorporated into LDA to learn sparse bases. However, since the learned sparse coefficients are globally distributed all over the basis image, the solution is still not robust to partial occlusions. In this paper, we propose a Local Sparse Discriminant Analysis (LoSDA) method, which aims at learning discriminant bases that consist of local object parts. In this way, it is more robust than dense or global basis based LDA algorithms for visual classification. The proposed model is formulated as a constrained least square regression problem with a group sparse regularization. Furthermore, we derive a weighted LoSDA (WLoSDA) approach to learn localized basis images, which also enables multi subspace learning and fusion. Finally, we develop an algorithm based on the Accelerated Proximal Gradient (APG) technique to solve the resulting weighted group sparse optimization problem. Experimental results on the FRGC v2.0 and the AR face databases show that the proposed LoSDA and WLoSDA algorithms both outperform the other state-of-the-art discriminant subspace learning algorithms under illumination variations and occlusions.

Collaboration


Dive into the Cuicui Kang's collaboration.

Top Co-Authors

Avatar

Chunhong Pan

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Shiming Xiang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Shengcai Liao

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Jian Wang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Yonghao He

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Changsheng Xu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Gang Xiong

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Wenjia Niu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Zhen Li

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Zigang Cao

Chinese Academy of Sciences

View shared research outputs
Researchain Logo
Decentralizing Knowledge