Is this you? Create Your Porfile

Baoguang Shi

Huazhong University of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Baoguang Shi is active.

Explore More

Publication

Featured researches published by Baoguang Shi.

IEEE Transactions on Geoscience and Remote Sensing | 2017

AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification

Gui-Song Xia; Jingwen Hu; Fan Hu; Baoguang Shi; Xiang Bai; Yanfei Zhong; Liangpei Zhang; Xiaoqiang Lu

Aerial scene classification, which aims to automatically label an aerial image with a specific semantic category, is a fundamental problem for understanding high-resolution remote sensing imagery. In recent years, it has become an active task in the remote sensing area, and numerous algorithms have been proposed for this task, including many machine learning and data-driven approaches. However, the existing data sets for aerial scene classification, such as UC-Merced data set and WHU-RS19, contain relatively small sizes, and the results on them are already saturated. This largely limits the development of scene classification algorithms. This paper describes the Aerial Image data set (AID): a large-scale data set for aerial scene classification. The goal of AID is to advance the state of the arts in scene classification of remote sensing images. For creating AID, we collect and annotate more than 10000 aerial scene images. In addition, a comprehensive review of the existing aerial scene classification techniques as well as recent widely used deep learning methods is given. Finally, we provide a performance analysis of typical aerial scene classification and deep learning approaches on AID, which can be served as the baseline results on this benchmark.

computer vision and pattern recognition | 2017

Detecting Oriented Text in Natural Images by Linking Segments

Baoguang Shi; Xiang Bai; Serge J. Belongie

Most state-of-the-art text detection methods are specific to horizontal Latin text and are not fast enough for real-time applications. We introduce Segment Linking (SegLink), an oriented text detection method. The main idea is to decompose text into two locally detectable elements, namely segments and links. A segment is an oriented box covering a part of a word or text line, A link connects two adjacent segments, indicating that they belong to the same word or text line. Both elements are detected densely at multiple scales by an end-to-end trained, fully-convolutional neural network. Final detections are produced by combining segments connected by links. Compared with previous methods, SegLink improves along the dimensions of accuracy, speed, and ease of training. It achieves an f-measure of 75.0% on the standard ICDAR 2015 Incidental (Challenge 4) benchmark, outperforming the previous best by a large margin. It runs at over 20 FPS on 512x512 images. Moreover, without modification, SegLink is able to detect long lines of non-Latin text, such as Chinese.

international conference on pattern recognition | 2016

Scene text script identification with Convolutional Recurrent Neural Networks

Jieru Mei; Luo Dai; Baoguang Shi; Xiang Bai

Script identification for scene text images is a challenging task. This paper describes a novel deep neural network structure that efficiently identifies scripts of images. In our design, we exploit two important factors, namely the image representation, and the spatial dependencies within text lines. To this end, we bring together a Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN) into one end-to-end trainable network. The former generates rich image representations, while the latter effectively analyzes long-term spatial dependencies. Besides, on top of the structure, we adopt an average pooling structure in order to deal with input images of arbitrary sizes. Experiments on several datasets, including SIW-13 and CVSI2015, demonstrate that our approach achieves superior performance, compared with previous approaches.

IEEE Transactions on Neural Networks | 2018

Face Alignment With Deep Regression

Baoguang Shi; Xiang Bai; Wenyu Liu; Jingdong Wang

In this paper, we present a deep regression approach for face alignment. The deep regressor is a neural network that consists of a global layer and multistage local layers. The global layer estimates the initial face shape from the whole image, while the following local layers iteratively update the shape with local image observations. Combining standard derivations and numerical approximations, we make all layers able to backpropagate error differentials, so that we can apply the standard backpropagation to jointly learn the parameters from all layers. We show that the resulting deep regressor gradually and evenly approaches the true facial landmarks stage by stage, avoiding the tendency that often occurs in the cascaded regression methods and deteriorates the overall performance: yielding early stage regressors with high alignment accuracy gains but later stage regressors with low alignment accuracy gains. Experimental results on standard benchmarks demonstrate that our approach brings significant improvements over previous cascaded regression algorithms.

international conference on pattern recognition | 2016

Distinguishing text/non-text natural images with Multi-Dimensional Recurrent Neural Networks

Pengyuan Lyu; Baoguang Shi; Chengquan Zhang; Xiang Bai

In this paper, we focus on the text/non-text classification problem: distinguishing images that contain text from a lot of natural images. To this end, we propose a novel neural network architecture, termed Convolutional Multi-Dimensional Recurrent Neural Network (CMDRNN), which distinguishes text/non-text images by classifying local image blocks, taking both region pixels and dependencies among blocks into account. The network is composed of a Convolutional Neural Network (CNN) and a Multi-Dimensional Recurrent Neural Network (MDRNN). The CNN extracts rich and high-level image representation, while the MDRNN analyzes dependencies along multiple directions and produces block-level predictions. By evaluating CMDRNN on a public dataset, we observe improvements over prior arts in terms of both speed and accuracy.

Neurocomputing | 2018

VD-SAN: Visual-Densely Semantic Attention Network for Image Caption Generation

Xinwei He; Yang Yang; Baoguang Shi; Xiang Bai

Abstract Recently, attribute has demonstrated its effectiveness in guiding image captioning system. However, most attributes based image captioning methods treat the attributes prediction task as a separate task and rely on a standalone stage to obtain the attributes for the given image, e.g., a pre-trained network like Fully Convolutional Neural Network (FCN) is usually adopted. Inherently, they ignore the correlation between the attribute prediction task and image representation extraction task, and at the same time increases the complexity of the image captioning system. In this paper, we aim to couple the attributes prediction stage and image representation extraction stage tightly and propose a novel and efficient image captioning framework called Visual-Densely Semantic Attention Network(VD-SAN). In particular, the whole captioning system consists of shared convolutional layers from Dense Convolutional Network (DenseNet), which are further split into a semantic attributes prediction branch and an image feature extraction branch, two semantic attention models, and a long short-term memory networks (LSTM) for caption generation. To evaluate the proposed architecture, we construct Flickr30K-ATT and MS-COCO-ATT datasets based on the original popular image caption datasets Flickr30K and MS COCO respectively, and each image from Flickr30K-ATT or MS-COCO-ATT is annotated with an attribute list in addition to the corresponding caption. Empirical results demonstrate that our captioning system can achieve significant improvements over state-of-the-art approaches.

national conference on artificial intelligence | 2016