Liang-Tien Chia
Nanyang Technological University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Liang-Tien Chia.
computer vision and pattern recognition | 2010
Shenghua Gao; Ivor W. Tsang; Liang-Tien Chia; Peilin Zhao
Sparse coding which encodes the original signal in a sparse signal space, has shown its state-of-the-art performance in the visual codebook generation and feature quantization process of BoW based image representation. However, in the feature quantization process of sparse coding, some similar local features may be quantized into different visual words of the codebook due to the sensitiveness of quantization. In this paper, to alleviate the impact of this problem, we propose a Laplacian sparse coding method, which will exploit the dependence among the local features. Specifically, we propose to use histogram intersection based kNN method to construct a Laplacian matrix, which can well characterize the similarity of local features. In addition, we incorporate this Laplacian matrix into the objective function of sparse coding to preserve the consistence in sparse representation of similar local features. Comprehensive experimental results show that our method achieves or outperforms existing state-of-the-art results, and exhibits excellent performance on Scene 15 data set.
european conference on computer vision | 2010
Shenghua Gao; Ivor W. Tsang; Liang-Tien Chia
Recent research has shown the effectiveness of using sparse coding(Sc) to solve many computer vision problems. Motivated by the fact that kernel trick can capture the nonlinear similarity of features, which may reduce the feature quantization error and boost the sparse coding performance, we propose Kernel Sparse Representation(KSR). KSR is essentially the sparse coding technique in a high dimensional feature space mapped by implicit mapping function. We apply KSR to both image classification and face recognition. By incorporating KSR into Spatial Pyramid Matching(SPM), we propose KSRSPM for image classification. KSRSPM can further reduce the information loss in feature quantization step compared with Spatial Pyramid Matching using Sparse Coding(ScSPM). KSRSPM can be both regarded as the generalization of Efficient Match Kernel(EMK) and an extension of ScSPM. Compared with sparse coding, KSR can learn more discriminative sparse codes for face recognition. Extensive experimental results show that KSR outperforms sparse coding and EMK, and achieves state-of-the-art performance for image classification and face recognition on publicly available datasets.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2013
Shenghua Gao; Ivor W. Tsang; Liang-Tien Chia
Sparse coding exhibits good performance in many computer vision applications. However, due to the overcomplete codebook and the independent coding process, the locality and the similarity among the instances to be encoded are lost. To preserve such locality and similarity information, we propose a Laplacian sparse coding (LSc) framework. By incorporating the similarity preserving term into the objective of sparse coding, our proposed Laplacian sparse coding can alleviate the instability of sparse codes. Furthermore, we propose a Hypergraph Laplacian sparse coding (HLSc), which extends our Laplacian sparse coding to the case where the similarity among the instances defined by a hypergraph. Specifically, this HLSc captures the similarity among the instances within the same hyperedge simultaneously, and also makes the sparse codes of them be similar to each other. Both Laplacian sparse coding and Hypergraph Laplacian sparse coding enhance the robustness of sparse coding. We apply the Laplacian sparse coding to feature quantization in Bag-of-Words image representation, and it outperforms sparse coding and achieves good performance in solving the image classification problem. The Hypergraph Laplacian sparse coding is also successfully used to solve the semi-auto image tagging problem. The good performance of these applications demonstrates the effectiveness of our proposed formulations in locality and similarity preservation.
IEEE Transactions on Image Processing | 2013
Shenghua Gao; Ivor W. Tsang; Liang-Tien Chia
Recent research has shown the initial success of sparse coding (Sc) in solving many computer vision tasks. Motivated by the fact that kernel trick can capture the nonlinear similarity of features, which helps in finding a sparse representation of nonlinear features, we propose kernel sparse representation (KSR). Essentially, KSR is a sparse coding technique in a high dimensional feature space mapped by an implicit mapping function. We apply KSR to feature coding in image classification, face recognition, and kernel matrix approximation. More specifically, by incorporating KSR into spatial pyramid matching (SPM), we develop KSRSPM, which achieves a good performance for image classification. Moreover, KSR-based feature coding can be shown as a generalization of efficient match kernel and an extension of Sc-based SPM. We further show that our proposed KSR using a histogram intersection kernel (HIK) can be considered a soft assignment extension of HIK-based feature quantization in the feature coding process. Besides feature coding, comparing with sparse coding, KSR can learn more discriminative sparse codes and achieve higher accuracy for face recognition. Moreover, KSR can also be applied to kernel matrix approximation in large scale learning tasks, and it demonstrates its robustness to kernel matrix approximation, especially when a small fraction of the data is used. Extensive experimental results demonstrate promising results of KSR in image classification, face recognition, and kernel matrix approximation. All these applications prove the effectiveness of KSR in computer vision and machine learning tasks.
european conference on computer vision | 2008
Ziming Zhang; Yiqun Hu; Syin Chan; Liang-Tien Chia
One of the key challenges in human action recognition from video sequences is how to model an action sufficiently. Therefore, in this paper we propose a novel motion-based representation called Motion Context (MC), which is insensitive to the scale and direction of an action, by employing image representation techniques. A MC captures the distribution of the motion words (MWs) over relative locations in a local region of the motion image (MI) around a reference point and thus summarizes the local motion information in a rich 3D MC descriptor. In this way, any human action can be represented as a 3D descriptor by summing up all the MC descriptors of this action. For action recognition, we propose 4 different recognition configurations: MW+pLSA, MW+SVM, MC+w 3-pLSA (a new direct graphical model by extending pLSA), and MC+SVM. We test our approach on two human action video datasets from KTH and Weizmann Institute of Science (WIS) and our performances are quite promising. For the KTH dataset, the proposed MC representation achieves the highest performance using the proposed w 3-pLSA. For the WIS dataset, the best performance of the proposed MC is comparable to the state of the art.
IEEE Transactions on Circuits and Systems for Video Technology | 2014
Zhixiang Ren; Shenghua Gao; Liang-Tien Chia; Ivor W. Tsang
The objective of this paper is twofold. First, we introduce an effective region-based solution for saliency detection. Then, we apply the achieved saliency map to better encode the image features for solving object recognition task. To find the perceptually and semantically meaningful salient regions, we extract superpixels based on an adaptive mean shift algorithm as the basic elements for saliency detection. The saliency of each superpixel is measured by using its spatial compactness, which is calculated according to the results of Gaussian mixture model (GMM) clustering. To propagate saliency between similar clusters, we adopt a modified PageRank algorithm to refine the saliency map. Our method not only improves saliency detection through large salient region detection and noise tolerance in messy background, but also generates saliency maps with a well-defined object shape. Experimental results demonstrate the effectiveness of our method. Since the objects usually correspond to salient regions, and these regions usually play more important roles for object recognition than background, we apply our achieved saliency map for object recognition by incorporating a saliency map into sparse coding-based spatial pyramid matching (ScSPM) image representation. To learn a more discriminative codebook and better encode the features corresponding to the patches of the objects, we propose a weighted sparse coding for feature coding. Moreover, we also propose a saliency weighted max pooling to further emphasize the importance of those salient regions in feature pooling module. Experimental results on several datasets illustrate that our weighted ScSPM framework greatly outperforms ScSPM framework, and achieves excellent performance for object recognition.
advances in multimedia | 2004
Yiqun Hu; Xing Xie; Wei-Ying Ma; Liang-Tien Chia; Deepu Rajan
Detection of salient regions in images is useful for object based image retrieval and browsing applications. This task can be done using methods based on the human visual attention model [1], where feature maps corresponding to color, intensity and orientation capture the corresponding salient regions. In this paper, we propose a strategy for combining the salient regions from the individual feature maps based on a new Composite Saliency Indicator (CSI) which measures the contribution of each feature map to saliency. The method also carries out a dynamic weighting of individual feature maps. The experiment results indicate that this combination strategy reflects the salient regions in an image more accurately.
acm multimedia | 2006
Huan Wang; Song Liu; Liang-Tien Chia
Ontologies are effective for representing domain concepts and relations in a form of semantic network. Many efforts have been made to import ontology into information matchmaking and retrieval. This trend is further accelerated by the convergence of various high-level concepts and low-level features supported by ontologies. In this paper we propose a comparison between traditional keyword based image retrieval and the promising ontology based image retrieval. To be complete, we construct the ontologies not only on text annotation, but also on a combination of text annotation and image feature. The experiments are conducted on a medium-sized data set including about 4000 images. The result proved the efficacy of utilizing both text and image features in a multi modality ontology to improve the image retrieval.
european conference on computer vision | 2010
Zhengxiang Wang; Yiqun Hu; Liang-Tien Chia
Image-To-Class (I2C) distance is first used in Naive-Bayes Nearest-Neighbor (NBNN) classifier for image classification and has successfully handled datasets with large intra-class variances. However, the performance of this distance relies heavily on the large number of local features in the training set and test image, which need heavy computation cost for nearest-neighbor (NN) search in the testing phase. If using small number of local features for accelerating the NN search, the performance will be poor. In this paper, we propose a large margin framework to improve the discrimination of I2C distance especially for small number of local features by learning Per-Class Mahalanobis metrics. Our I2C distance is adaptive to different class by combining with the learned metric for each class. These multiple Per-Class metrics are learned simultaneously by forming a convex optimization problem with the constraints that the I2C distance from each training image to its belonging class should be less than the distance to other classes by a large margin. A gradient descent method is applied to efficiently solve this optimization problem. For efficiency and performance improved, we also adopt the idea of spatial pyramid restriction and learning I2C distance function to improve this I2C distance. We show in experiments that the proposed method can significantly outperform the original NBNN in several prevalent image datasets, and our best results can achieve state-of-the-art performance on most datasets.
computer vision and pattern recognition | 2011
Shenghua Gao; Liang-Tien Chia; Ivor W. Tsang
We present a multi-layer group sparse coding framework for concurrent image classification and annotation. By leveraging the dependency between image class label and tags, we introduce a multi-layer group sparse structure of the reconstruction coefficients. Such structure fully encodes the mutual dependency between the class label, which describes the image content as a whole, and tags, which describe the components of the image content. Then we propose a multi-layer group based tag propagation method, which combines the class label and subgroups of instances with similar tag distribution to annotate test images. Moreover, we extend our multi-layer group sparse coding in the Reproducing Kernel Hilbert Space (RKHS) which captures the nonlinearity of features, and further improves performances of image classification and annotation. Experimental results on the LabelMe, UIUC-Sport and NUS-WIDE-Object databases show that our method outperforms the baseline methods, and achieves excellent performances in both image classification and annotation tasks.