Is this you? Create Your Porfile

Liang Bai

National University of Defense Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Liang Bai is active.

Explore More

Publication

Featured researches published by Liang Bai.

Multimedia Tools and Applications | 2018

Irrelevance reduction with locality-sensitive hash learning for efficient cross-media retrieval

Yuhua Jia; Liang Bai; Peng Wang; Jinlin Guo; Yuxiang Xie; Tianyuan Yu

Cross-media retrieval is an imperative approach to handle the explosive growth of multimodal data on the web. However, existing approaches to cross-media retrieval are computationally expensive due to high dimensionality. To efficiently retrieve in multimodal data, it is essential to reduce the proportion of irrelevant documents. In this paper, we propose a fast cross-media retrieval approach (FCMR) based on locality-sensitive hashing (LSH) and neural networks. One modality of multimodal information is projected by LSH algorithm to cluster similar objects into the same hash bucket and dissimilar objects into different ones and then another modality is mapped into these hash buckets using hash functions learned through neural networks. Once given a textual or visual query, it can be efficiently mapped to a hash bucket in which objects stored can be near neighbors of this query. Experimental results show that, in the set of the queries’ near neighbors obtained by the proposed method, the proportions of relevant documents can be much boosted, and it indicates that the retrieval based on near neighbors can be effectively conducted. Further evaluations on two public datasets demonstrate the efficacy of the proposed retrieval method compared to the baselines.

IEEE Transactions on Multimedia | 2018

Bag of Surrogate Parts Feature for Visual Recognition

Yanming Guo; Yu Liu; Songyang Lao; E. Bakker; Liang Bai; Michael S. Lew

Convolutional neural networks (CNNs) have attracted significant attention in visual recognition. Several recent studies have shown that, in addition to the fully connected layers, the features derived from the convolutional layers of CNNs can also achieve promising performance in image classification tasks. In this paper, we propose a new feature from the convolutional layers, called Bag of Surrogate Parts (BoSP), and its spatial variant, Spatial-BoSP (S-BoSP). The main idea is, we assume the feature maps in the convolutional layers as surrogate parts, and densely sample and assign image regions to these surrogate parts by observing the activation values. Together with BoSP/S-BoSP, we further propose another two schemes to enhance the performance: scale pooling and global-part prediction. Scale pooling aims to handle the objects with different scales and deformations, and global-part prediction combines the predictions of global and part features. By conducting extensive experiments on generic object, fine-grained object and scene datasets, we find the proposed scheme can not only achieve superior performance to the fully connected feature, but also produces competitive or, in some cases, remarkably better performance than the state of the art.

advances in multimedia | 2016

A Deep Two-Stream Network for Bidirectional Cross-Media Information Retrieval

Tianyuan Yu; Liang Bai; Jinlin Guo; Zheng Yang; Yuxiang Xie

The recent development in deep learning techniques has showed its wide applications in traditional vision tasks like image classification and object detection. However, as a fundamental problem in artificial intelligence that connects computer vision and natural language processing, bidirectional retrieval of images and sentences is not as popular as the traditional problems, and the results are far from satisfying. In this paper, we consider learning a cross-media representation model with a deep two-stream network. Previous models generally use image label information to train the dataset or strictly correspond the local features in images and texts. Unlike those models, we learn globalized local features, which can reflect the salient objects as well as the details in the images and sentences. After mapping the cross-media data into a common feature space, we use max-margin as the criterion function to update the network. The experiment on the dataset of Flickr8k shows that our approach achieves superior performance compared with the state-of-the-art methods.

pacific rim conference on multimedia | 2015

Convolutional Neural Networks Features: Principal Pyramidal Convolution

Yanming Guo; Songyang Lao; Yu Liu; Liang Bai; Shi Liu; Michael S. Lew

The features extracted from convolutional neural networks (CNNs) are able to capture the discriminative part of an image and have shown superior performance in visual recognition. Furthermore, it has been verified that the CNN activations trained from large and diverse datasets can act as generic features and be transferred to other visual recognition tasks. In this paper, we aim to learn more from an image and present an effective method called Principal Pyramidal Convolution (PPC). The scheme first partitions the image into two levels, and extracts CNN activations for each sub-region along with the whole image, and then aggregates them together. The concatenated feature is later reduced to the standard dimension using Principal Component Analysis (PCA) algorithm, generating the refined CNN feature. When applied in image classification and retrieval tasks, the PPC feature consistently outperforms the conventional CNN feature, regardless of the network type where they derive from. Specifically, PPC achieves state-of-the-art result on the MIT Indoor67 dataset, utilizing the activations from Places-CNN.

Archive | 2013

Player Detection Algorithm Based on Color Segmentation and Improved CamShift Algorithm

Yanming Guo; Songyang Lao; Liang Bai

Aimed at the most proposed methods that detecting and tracking moving object cannot receive a good segmentation of the player in dynamic scenes, the chapter put forward an algorithm, based on color segmentation and the CamShift algorithm, to detect the tennis player. Firstly, apply a supervised clustering binary tree to automatically detect the target’s area. Secondly, take the detected area as the initial tracking window, track the target by using the improved CamShift algorithm. On the basis of accurate tracking, the chapter raises a new extraction method of the player, taking use of the characteristic that mostly the court’s color is simple. The method raises the thought that extract the player by the color feature, and it combines the CamShift algorithm with the difference in frame to overcome the tracking-loss problem. Experimental results show that our algorithm is effective in detecting the player in dynamic scenes.

conference on multimedia modeling | 2018

Deep Convolutional Neural Network for Correlating Images and Sentences

Yuhua Jia; Liang Bai; Peng Wang; Jinlin Guo; Yuxiang Xie

In this paper, we address the problem of image sentence matching and propose a novel convolutional neural network architecture which includes three modules: the visual module for composing fragmental features of images, the textual module for composing fragmental features of sentences, and the fusional module for encoding features of image and sentence fragments jointly to generate final matching scores of image sentence pairs. Different with previous fragment level models, the proposed method represents fragments of images as feature maps generated by CNN, which is more reasonable and effective. By allowing independent and specialized fragmental feature representations to be leveraged for each modality like image or text, the proposed method is flexible in interlinking the intermediate fragmental features to generate a joint abstraction of two modalities, which provides better matching scores. Extensive evaluations on two benchmark datasets have validated the competitive performance of our approach compared to the state-of-the-art bidirectional image sentence retrieval approaches.

Multimedia Tools and Applications | 2018

Semantically-enhanced kernel canonical correlation analysis: a multi-label cross-modal retrieval

Yuhua Jia; Liang Bai; Shuang Liu; Peng Wang; Jinlin Guo; Yuxiang Xie

Aiming at measuring the inter-media semantic similarities, cross-modal retrieval tries to align heterogenous features to an intermediate common subspace in which they can be reasonably compared. This is based on the same understanding of the semantics which are represented by different modalities. However, the semantics can usually be reflected by multiple concepts since concepts co-occur in real-world rather than occur in isolation. This leads to a more challenging task of multi-label cross-modal retrieval in which multiple concepts are annotated as labels for images as an example. More importantly, the co-occurrence patterns of concepts result in correlated pairs of labels whose relationships need to be considered in an accurate cross-modal retrieval. In this paper, we propose multi-label kernel canonical correlation analysis (ml-KCCA), a novel approach for cross-modal retrieval which enhances kernel CCA with high-level semantic information reflected in multi-label annotations. By kernelizing correlation extraction from multi-label information, more complex non-linear correlations between different modalities can be measured in order to learn a discriminative subspace which is more suitable for cross-modal retrieval tasks. Extensive evaluations on public datasets have validated the improvements of our approach over the state-of-the-art cross-modal retrieval approaches including other CCA extensions.

conference on multimedia modeling | 2017

Deep Convolutional Neural Network for Bidirectional Image-Sentence Mapping

Tianyuan Yu; Liang Bai; Jinlin Guo; Zheng Yang; Yuxiang Xie

With the rapid development of the Internet and the explosion of data volume, it is important to access the cross-media big data including text, image, audio, and video, etc., efficiently and accurately. However, the content heterogeneity and semantic gap make it challenging to retrieve such cross-media archives. The existing approaches try to learn the connection between multiple modalities by direct utilization of hand-crafted low-level features, and the learned correlations are merely constructed with high-level feature representations without considering semantic information. To further exploit the intrinsic structures of multimodal data representations, it is essential to build up an interpretable correlation between these heterogeneous representations. In this paper, a deep model is proposed to first learn the high-level feature representation shared by different modalities like texts and images, with convolutional neural network (CNN). Moreover, the learned CNN features can reflect the salient objects as well as the details in the images and sentences. Experimental results demonstrate that proposed approach outperforms the current state-of-the-art base methods on public dataset of Flickr8K.

bio-inspired computing: theories and applications | 2016

Cross-Media Information Retrieval with Deep Convolutional Neural Network

Liang Bai; Tianyuan Yu; Jinlin Guo; Zheng Yang; Yuxiang Xie

With the explosive growth of multimedia data, different types of media data often coexist in web repositories. Accordingly, it is more and more important to explore underlying intricate cross-media correlation so as to improve the retrieval results from cross-media data. However, how to effectively discover the correlations between multi-modal data has been a barrier to successful retrieval of cross-media information. To address the above problems, we propose a novel model projecting both the text modality and the visual modality into a common semantic feature space with the convolutional neural network feature. Unlike the existing approaches, the proposed model learns the high-level feature representation shared by multiple modalities for cross-media information retrieval. Experiments are conducted on public benchmark dataset, and results show the effectiveness of our approach.

advances in multimedia | 2014

A Comparison between Artificial Neural Network and Cascade-Correlation Neural Network in Concept Classification

Yanming Guo; Liang Bai; Songyang Lao; Song Wu; Michael S. Lew

Deep learning has achieved significant attention recently due to promising results in representing and classifying concepts most prominently in the form of convolutional neural networks CNN. While CNN has been widely studied and evaluated in computer vision, there are other forms of deep learning algorithms which may be promising. One interesting deep learning approach which has received relatively little attention in visual concept classification is Cascade-Correlation Neural Networks CCNN. In this paper, we create a visual concept retrieval system which is based on CCNN. Experimental results on the CalTech101 dataset indicate that CCNN outperforms ANN.

Explore More