Lingqiao Liu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lingqiao Liu is active.

Explore More

Publication

Featured researches published by Lingqiao Liu.

international conference on computer vision | 2011

In defense of soft-assignment coding

Lingqiao Liu; Lei Wang; Xinwang Liu

In object recognition, soft-assignment coding enjoys computational efficiency and conceptual simplicity. However, its classification performance is inferior to the newly developed sparse or local coding schemes. It would be highly desirable if its classification performance could become comparable to the state-of-the-art, leading to a coding scheme which perfectly combines computational efficiency and classification performance. To achieve this, we revisit soft-assignment coding from two key aspects: classification performance and probabilistic interpretation. For the first aspect, we argue that the inferiority of soft-assignment coding is due to its neglect of the underlying manifold structure of local features. To remedy this, we propose a simple modification to localize the soft-assignment coding, which surprisingly achieves comparable or even better performance than existing sparse or local coding schemes while maintaining its computational advantage. For the second aspect, based on our probabilistic interpretation of the soft-assignment coding, we give a probabilistic explanation to the magic max-pooling operation, which has successfully been used by sparse or local coding schemes but still poorly understood. This probability explanation motivates us to develop a new mix-order max-pooling operation which further improves the classification performance of the proposed coding scheme. As experimentally demonstrated, the localized soft-assignment coding achieves the state-of-the-art classification performance with the highest computational efficiency among the existing coding schemes.

computer vision and pattern recognition | 2016

What Value Do Explicit High Level Concepts Have in Vision to Language Problems

Qi Wu; Chunhua Shen; Lingqiao Liu; Anthony R. Dick; Anton van den Hengel

Much recent progress in Vision-to-Language (V2L) problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). This approach does not explicitly represent high-level semantic concepts, but rather seeks to progress directly from image features to text. In this paper we investigate whether this direct approach succeeds due to, or despite, the fact that it avoids the explicit representation of high-level information. We propose a method of incorporating high-level concepts into the successful CNN-RNN approach, and show that it achieves a significant improvement on the state-of-the-art in both image captioning and visual question answering. We also show that the same mechanism can be used to introduce external semantic information and that doing so further improves performance. We achieve the best reported results on both image captioning and VQA on several benchmark datasets, and provide an analysis of the value of explicit high-level concepts in V2L problems.

computer vision and pattern recognition | 2015

The treasure beneath convolutional layers: Cross-convolutional-layer pooling for image classification

Lingqiao Liu; Chunhua Shen; Anton van den Hengel

A number of recent studies have shown that a Deep Convolutional Neural Network (DCNN) pretrained on a large dataset can be adopted as a universal image descriptor, and that doing so leads to impressive performance at a range of image classification tasks. Most of these studies, if not all, adopt activations of the fully-connected layer of a DCNN as the image or region representation and it is believed that convolutional layer activations are less discriminative. This paper, however, advocates that if used appropriately, convolutional layer activations constitute a powerful image representation. This is achieved by adopting a new technique proposed in this paper called cross-convolutional-layer pooling. More specifically, it extracts subarrays of feature maps of one convolutional layer as local features, and pools the extracted features with the guidance of the feature maps of the successive convolutional layer. Compared with existing methods that apply DCNNs in the similar local feature setting, the proposed method avoids the input image style mismatching issue which is usually encountered when applying fully connected layer activations to describe local regions. Also, the proposed method is easier to implement since it is codebook free and does not have any tuning parameters. By applying our method to four popular visual classification tasks, it is demonstrated that the proposed method can achieve comparable or in some cases significantly better performance than existing fully-connected layer based image representations.

computer vision and pattern recognition | 2015

Mid-level deep pattern mining

Yao Li; Lingqiao Liu; Chunhua Shen; Anton van den Hengel

Mid-level visual element discovery aims to find clusters of image patches that are both representative and discriminative. In this work, we study this problem from the prospective of pattern mining while relying on the recently popularized Convolutional Neural Networks (CNNs). Specifically, we find that for an image patch, activation extracted from the first fully-connected layer of a CNN have two appealing properties which enable its seamless integration with pattern mining. Patterns are then discovered from a large number of CNN activations of image patches through the well-known association rule mining. When we retrieve and visualize image patches with the same pattern (See Fig. 1), surprisingly, they are not only visually similar but also semantically consistent. We apply our approach to scene and object classification tasks, and demonstrate that our approach outperforms all previous works on mid-level visual element discovery by a sizeable margin with far fewer elements being used. Our approach also outperforms or matches recent works using CNN for these tasks. Source code of the complete system is available online.

computer vision and pattern recognition | 2016

Less is More: Zero-Shot Learning from Online Textual Documents with Noise Suppression

Ruizhi Qiao; Lingqiao Liu; Chunhua Shen; Anton van den Hengel

Classifying a visual concept merely from its associated online textual source, such as a Wikipedia article, is an attractive research topic in zero-shot learning because it alleviates the burden of manually collecting semantic attributes. Recent work has pursued this approach by exploring various ways of connecting the visual and text domains. In this paper, we revisit this idea by going further to consider one important factor: the textual representation is usually too noisy for the zero-shot learning application. This observation motivates us to design a simple yet effective zero-shot learning method that is capable of suppressing noise in the text. Specifically, we propose an l2,1-norm based objective function which can simultaneously suppress the noisy signal in the text and learn a function to match the text document and visual features. We also develop an optimization algorithm to efficiently solve the resulting problem. By conducting experiments on two large datasets, we demonstrate that the proposed method significantly outperforms those competing methods which rely on online information sources but with no explicit noise suppression. Furthermore, we make an in-depth analysis of the proposed method and provide insight as to what kind of information in documents is useful for zero-shot learning.

IEEE Transactions on Circuits and Systems for Video Technology | 2017

Temporal Pyramid Pooling-Based Convolutional Neural Network for Action Recognition

Peng Wang; Yuanzhouhan Cao; Chunhua Shen; Lingqiao Liu; Heng Tao Shen

Encouraged by the success of convolutional neural networks (CNNs) in image classification, recently much effort is spent on applying the CNNs to the video-based action recognition problems. One challenge is that a video contains a varying number of frames, which is incompatible to the standard input format of the CNNs. Existing methods handle this issue either by directly sampling a fixed number of frames or bypassing this issue by introducing a 3D convolutional layer, which conducts convolution in spatial-temporal domain. In this paper, we propose a novel network structure, which allows an arbitrary number of frames as the network input. The key to our solution is to introduce a module consisting of an encoding layer and a temporal pyramid pooling layer. The encoding layer maps the activation from the previous layers to a feature vector suitable for pooling, whereas the temporal pyramid pooling layer converts multiple frame-level activations into a fixed-length video-level representation. In addition, we adopt a feature concatenation layer that combines the appearance and motion information. Compared with the frame sampling strategy, our method avoids the risk of missing any important frames. Compared with the 3D convolutional method, which requires a huge video data set for network training, our model can be learned on a small target data set because we can leverage the off-the-shelf image-level CNN for model parameter initialization. Experiments on three challenging data sets, Hollywood2, HMDB51, and UCF101 demonstrate the effectiveness of the proposed network.

computer vision and pattern recognition | 2017

Graph-Structured Representations for Visual Question Answering

Damien Teney; Lingqiao Liu; Anton van den Hengel

This paper proposes to improve visual question answering (VQA) with structured representations of both scene contents and questions. A key challenge in VQA is to require joint reasoning over the visual and text domains. The predominant CNN/LSTM-based approach to VQA is limited by monolithic vector representations that largely ignore structure in the scene and in the question. CNN feature vectors cannot effectively capture situations as simple as multiple object instances, and LSTMs process questions as series of words, which do not reflect the true complexity of language structure. We instead propose to build graphs over the scene objects and over the question words, and we describe a deep neural network that exploits the structure in these representations. We show that this approach achieves significant improvements over the state-of-the-art, increasing accuracy from 71.2% to 74.4% in accuracy on the abstract scenes multiple-choice benchmark, and from 34.7% to 39.1% in accuracy over pairs of balanced scenes, i.e. images with fine-grained differences and opposite yes/no answers to a same question.

Pattern Recognition | 2014

HEp-2 cell image classification with multiple linear descriptors

Lingqiao Liu; Lei Wang

Abstract The automatic classification of the HEp-2 cell stain patterns from indirect immunofluorescence images has attracted much attention recently. As an image classification problem, it can be well solved by the state-of-the-art bag-of-features (BoF) model as long as a suitable local descriptor is known. Unfortunately, for this special task, we have very limited knowledge of such a descriptor. In this paper, we explore the possibility of automatically learning the descriptor from the image data itself. Specifically, we assume that a local patch can be well described by a set of linear projections performed on its pixel values. Based on this assumption, both unsupervised and supervised approaches are explored for learning the projections. More importantly, we propose a multi-projection-multi-codebook scheme which creates multiple linear projection descriptors and multiple image representation channels with each channel corresponding to one descriptor. Through our analysis, we show that the image representation obtained by combining these different channels can be more discriminative than that obtained from a single-projection scheme. This analysis is further verified by our experimental study. We evaluate the proposed approach by strictly following the protocol suggested by the organizer of the 2012 HEp-2 cell classification contest which is hosted to compare the state-of-the-art methods for HEp-2 cell classification. In this paper, our system achieves 66.6% cell level classification accuracy which is just slightly lower than the best performance achieved in the HEp-2 cell classification contest. This result is impressive and promising considering that we only utilize a single type of feature (namely, linear projection coefficients of patch pixel values) which is learned from the image data.

IEEE Transactions on Systems, Man, and Cybernetics | 2013

An Adaptive Approach to Learning Optimal Neighborhood Kernels

Xinwang Liu; Jianping Yin; Lei Wang; Lingqiao Liu; Jun Liu; Chenping Hou; Jian Zhang

Learning an optimal kernel plays a pivotal role in kernel-based methods. Recently, an approach called optimal neighborhood kernel learning (ONKL) has been proposed, showing promising classification performance. It assumes that the optimal kernel will reside in the neighborhood of a “pre-specified” kernel. Nevertheless, how to specify such a kernel in a principled way remains unclear. To solve this issue, this paper treats the pre-specified kernel as an extra variable and jointly learns it with the optimal neighborhood kernel and the structure parameters of support vector machines. To avoid trivial solutions, we constrain the pre-specified kernel with a parameterized model. We first discuss the characteristics of our approach and in particular highlight its adaptivity. After that, two instantiations are demonstrated by modeling the pre-specified kernel as a common Gaussian radial basis function kernel and a linear combination of a set of base kernels in the way of multiple kernel learning (MKL), respectively. We show that the optimization in our approach is a min-max problem and can be efficiently solved by employing the extended level method and Nesterovs method. Also, we give the probabilistic interpretation for our approach and apply it to explain the existing kernel learning methods, providing another perspective for their commonness and differences. Comprehensive experimental results on 13 UCI data sets and another two real-world data sets show that via the joint learning process, our approach not only adaptively identifies the pre-specified kernel, but also achieves superior classification performance to the original ONKL and the related MKL algorithms.

Neurocomputing | 2012

Incorporation of radius-info can be simple with SimpleMKL

Xinwang Liu; Lei Wang; Jianping Yin; Lingqiao Liu

Recent research has shown the benefit of incorporating the radius of the Minimal Enclosing Ball (MEB) of training data into Multiple Kernel Learning (MKL). However, straightforwardly incorporating this radius leads to complex learning structure and considerably increased computation. Moreover, the notorious sensitivity of this radius to outliers can adversely affect MKL. In this paper, instead of directly incorporating the radius of MEB, we incorporate its close relative, the trace of data scattering matrix, to avoid the above problems. By analyzing the characteristics of the resulting optimization, we show that the benefit of incorporating the radius of MEB can be fully retained. More importantly, our algorithm can be effortlessly realized within the existing MKL framework such as SimpleMKL. The mere difference is the way to normalize the basic kernels. Although this kernel normalization is not our invention, our theoretic derivation uncovers why this normalization can achieve better classification performance, which has not appeared in the literature before. As experimentally demonstrated, our method achieves the overall best learning performance in various settings. In another perspective, our work improves SimpleMKL to utilize the information of the radius of MEB in an efficient and practical way.

Explore More