Is this you? Create Your Porfile

Yalong Bai

Harbin Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yalong Bai is active.

Explore More

Publication

Featured researches published by Yalong Bai.

acm multimedia | 2014

Bag-of-Words Based Deep Neural Network for Image Retrieval

Yalong Bai; Wei Yu; Tianjun Xiao; Chang Xu; Kuiyuan Yang; Wei-Ying Ma; Tiejun Zhao

This work targets image retrieval task hold by MSR-Bing Grand Challenge. Image retrieval is considered as a challenge task because of the gap between low-level image representation and high-level textual query representation. Recently further developed deep neural network sheds light on narrowing the gap by learning high-level image representation from raw pixels. In this paper, we proposed a bag-of-words based deep neural network for image retrieval task, which learns high-level image representation and maps images into bag-of-words space. The DNN model is trained on the large scale clickthrough data, and the relevance between query and image is measured by the cosine similarity of querys bag-of-words representation and images bag-of-words representation predicted by DNN, the visual similarity of images is computed by high-level image representation extracted via the DNN model too. Finally, PageRank algorithm is used to further improve the ranking list by considering visual similarity of images for each query. The experimental results achieved state-of-the-art performance and verified the effectiveness of our proposed method.

british machine vision conference | 2014

DNN Flow: DNN Feature Pyramid based Image Matching.

Wei Yu; Kuiyuan Yang; Yalong Bai; Hongxun Yao; Yong Rui

Image matching especially in category level is a challenge but important problem in vision. The advance of image matching largely depends on the advance of image features. In viewing recent success of learned image feature by DNN, we propose an image matching algorithm based on DNN feature pyramid, named as DNN Flow. The nature of DNN feature pyramid in detecting different level patterns makes it is suitable to match two images in a coarse to fine manner, where top level coarsely matches two images in object level, middle level matches two images in part level, and low level finely matches two images in pixel level. The coarse to fine matching based on DNN feature pyramid is formulated as a series of optimization problems considering the guidance from top level. Extensive experiments demonstrate the superiority of DNN Flow in image matching under challenge variations.

IEEE Transactions on Multimedia | 2015

Learning Cross Space Mapping via DNN Using Large Scale Click-Through Logs

Wei Yu; Kuiyuan Yang; Yalong Bai; Hongxun Yao; Yong Rui

The gap between low-level visual signals and high-level semantics has been progressively bridged by continuous development of deep neural network (DNN). With recent progress of DNN, almost all image classification tasks have achieved new records of accuracy. To extend the ability of DNN to image retrieval tasks, we proposed a unified DNN model for image-query similarity calculation by simultaneously modeling image and query in one network. The unified DNN is named the cross space mapping (CSM) model, which contains two parts, a convolutional part and a query-embedding part. The image and query are mapped to a common vector space via these two parts respectively, and image-query similarity is naturally defined as an inner product of their mappings in the space. To ensure good generalization ability of the DNN, we learn weights of the DNN from a large number of click-through logs which consists of 23 million clicked image-query pairs between 1 million images and 11.7 million queries. Both the qualitative results and quantitative results on an image retrieval evaluation task with 1000 queries demonstrate the superiority of the proposed method.

international conference on multimedia and expo | 2016

Improve dog recognition by mining more information from both click-through logs and pre-trained models

Guotian Xie; Kuiyuan Yang; Yalong Bai; Min Shang; Yong Rui; Jian-Huang Lai

Dog breeds recognition is a typical task of fine-grained image classification, which requires both more training images to describe each dog breed and better models to automatically discriminate different dog breeds. In this paper, we use click-through logs as source data and pre-trained deep convolutional neural network (DCNN) as initial model to build our dog recognizer. To improve recognition accuracy, we propose to mine more useful information from both data and model. Mining more information from data is achieved by mining more images for each dog breed which is achieved through automatically finding more dog-related words, while more information from pre-trained DCNNs is mined by keeping related neurons in last layer which are usually ignored in previous methods. Extensive offline experiments show consistent improvement of the proposed method. We also participate “MSR Image Recognition Challenge (IRC) @ ICME2016” under the setting of not using external data for online evaluation, our method achieves the second place comparing all methods from both tracks using and not using external data, and wins methods also not using external data by a large margin (i.e., 86.90% vs 71.35% measured in top-5 accuracy).

ACM Transactions on Multimedia Computing, Communications, and Applications | 2018

Automatic Data Augmentation from Massive Web Images for Deep Visual Recognition

Yalong Bai; Kuiyuan Yang; Tao Mei; Wei-Ying Ma; Tiejun Zhao

Large-scale image datasets and deep convolutional neural networks (DCNNs) are the two primary driving forces for the rapid progress in generic object recognition tasks in recent years. While lots of network architectures have been continuously designed to pursue lower error rates, few efforts are devoted to enlarging existing datasets due to high labeling costs and unfair comparison issues. In this article, we aim to achieve lower error rates by augmenting existing datasets in an automatic manner. Our method leverages both the web and DCNN, where the web provides massive images with rich contextual information, and DCNN replaces humans to automatically label images under the guidance of web contextual information. Experiments show that our method can automatically scale up existing datasets significantly from billions of web pages with high accuracy. The performance on object recognition tasks and transfer learning tasks have been significantly improved by using the automatically augmented datasets, which demonstrates that more supervisory information has been automatically gathered from the web. Both the dataset and models trained on the dataset have been made publicly available.

acm multimedia | 2015