Bailan Feng | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bailan Feng is active.

Explore More

Publication

Featured researches published by Bailan Feng.

international conference on image processing | 2014

Image character recognition using deep convolutional neural network learned from different languages

Jinfeng Bai; Zhineng Chen; Bailan Feng; Bo Xu

This paper proposes a shared-hidden-layer deep convolutional neural network (SHL-CNN) for image character recognition. In SHL-CNN, the hidden layers are made common across characters from different languages, performing a universal feature extraction process that aims at learning common character traits existed in different languages such as strokes, while the final softmax layer is made language dependent, trained based on characters from the destination language only. This paper is the first attempt to introduce the SHL-CNN framework to image character recognition. Under the SHL-CNN framework, we discuss several issues including architecture of the network, training of the network, from which a suitable SHL-CNN model for image character recognition is empirically learned. The effectiveness of the learned SHL-CNN is verified on both English and Chinese image character recognition tasks, showing that the SHL-CNN can reduce recognition errors by 16-30% relatively compared with models trained by characters of only one language using conventional CNN, and by 35.7% relatively compared with state-of-the-art methods. In addition, the shared hidden layers learned are also useful for unseen image character recognition tasks.

international conference on artificial neural networks | 2014

Chinese Image Character Recognition Using DNN and Machine Simulated Training Samples

Jinfeng Bai; Zhineng Chen; Bailan Feng; Bo Xu

Inspired by the success of deep neural network (DNN) models in solving challenging visual problems, this paper studies the task of Chinese Image Character Recognition (ChnICR) by leveraging DNN model and huge machine simulated training samples. To generate the samples, clean machine born Chinese characters are extracted and are plus with common variations of image characters such as changes in size, font, boldness, shift and complex backgrounds, which in total produces over 28 million character images, covering the vast majority of occurrences of Chinese character in real life images. Based on these samples, a DNN training procedure is employed to learn the appropriate Chinese character recognizer, where the width and depth of DNN, and the volume of samples are empirically discussed. Parallel to this, a holistic Chinese image text recognition system is developed. Encouraging experimental results on text from 13 TV channels demonstrate the effectiveness of the learned recognizer, from which significant performance gains are observed compared to the baseline system.

international conference on acoustics, speech, and signal processing | 2012

Multi-modal information fusion for news story segmentation in broadcast video

Bailan Feng; Peng Ding; Jiansong Chen; Jinfeng Bai; Su Xu; Bo Xu

With the fast development of high-speed network and digital video recording technologies, broadcast video has been playing a more and more important role in our daily life. In this paper, we propose a novel news story segmentation scheme which can segment broadcast video into story units with multi-modal information fusion (MMIF) strategy. Compared with traditional methods, the proposed scheme extracts a wealth of semantic-level features including anchor person, topic caption, face, silence, acoustic change, audio keywords and textual content. Parallel to this, we make use of a multi-modal information fusion strategy for news story boundary characterization by joining these visual, audio and textual cues. Encouraging experimental results on News Vision dataset demonstrate the effectiveness of the proposed scheme.

international conference on multimedia retrieval | 2013

A general Framework of video segmentation to logical unit based on conditional random fields

Su Xu; Bailan Feng; Zhineng Chen; Bo Xu

Segmenting video into logical units like scenes in movies and topic units in News videos is an essential prerequisite for a wide range of video related applications. In this paper, a novel approach for logical unit segmentation based on conditional random fields (CRFs) is presented. In comparison with previous approaches that handle scenes and topic units separately, the proposed approach deals with them in a general framework. Specifically, four types of shots are defined and represented by four middle-level features, i.e., shot difference, scene transition, shot theme and audio type. Then, the problem of logical unit segmentation is novelly formulated as a problem of identifying the type of shot based on the extracted features, by leveraging the CRFs model. The proposed framework effectively integrate visual, audio and contextual features, and it is able to produce ideal result for both scene and topic unit segmentation. The effectiveness of the proposed approach is verified on seven mainstream types of videos, from which average F-measures of 88% and 86% on scenes and topic units are reported respectively, illustrating that the proposed method can accurately segment logical units in different genres of videos.

international conference on image processing | 2012

Effective near-duplicate image retrieval with image-specific visual phrase selection

Jiansong Chen; Bailan Feng; Lei Zhu; Peng Ding; Bo Xu

Near-duplicate image retrieval (NDIR) is an important topic for many applications such as multimedia content management, copyright infringement identification et al. In this work we propose a novel NDIR framework based on visual phrase. Compared with previous researches, this paper first introduces a spatial visual phrase (SVP) model enabling to capture relative geometry information between visual words. Then, it proposes an image-specific strategy to select descriptive SVPs. The strategy can not only handle the phrase sparseness problem which occurs in traditional selection strategy but also allow to select visual phrases according to the characteristic of each image. Experiments are carried out over Ukbench dataset and TRECVID dataset respectively, and encouraging experimental results demonstrate that both the SVP model and the selection strategy significantly improve the overall performance.

conference on multimedia modeling | 2014

Video to Article Hyperlinking by Multiple Tag Property Exploration

Zhineng Chen; Bailan Feng; Hongtao Xie; Rong Zheng; Bo Xu

Showing video and article on the same page, as done by official web agencies such as CNN.com and Yahoo!, provides a practical way for convenient information digestion. However, as the absence of article, this layout is infeasible for mainstream web video repositories like YouTube. This paper investigates the problem of hyperlinking web videos to relevant articles available on the Web. Given a video, the task is accomplished by firstly identifying its contextual tags (e.g., who are doing what at where and when) and then employing a search based association to relevant articles. Specifically, we propose a multiple tag property exploration (mTagPE) approach to identify contextual tags, where tag relevance, tag clarity and tag correlation are defined and measured by leveraging visual duplicate analyses, online knowledge bases and tag co-occurrence. Then, the identification task is formulated as a random walk along a tag relation graph that smoothly integrates the three properties. The random walk aims at picking up relevant, clear and correlated tags as a set of contextual tags, which is further treated as a query to issue commercial search engines to obtain relevant articles. We have conducted experiments on a largescale web video dataset. Both objective performance evaluations and subjective user studies show the effectiveness of the proposed hyperlinking. It produces more accurate contextual tags and thus a larger number of relevant articles than other approaches.

fuzzy systems and knowledge discovery | 2012

TV commercial detection using constrained viterbi algorithm based on time distribution

Bo Zhang; Bailan Feng; Peng Ding; Bo Xu

TV Commercials play an important role in our lives, and automatic commercial detection is very useful in TV video analysis. Most of previous works focus on visual and audio features of commercial, while ignoring the information of distributions of commercial blocks in different program types and broadcast times. In this paper, we propose a novel method to fuse visual, audio features and global characteristics to detect commercial blocks. Firstly, visual and audio features such as FMPI (Image Frames Marked with Product Information) are utilized to predict the probabilities of commercial shot using SVM classifier. And then, these output probabilities are regarded as observations of a Markov Chain of commercial shots. At last, a viterbi algorithm with time constraints, which are modeled the distributions of duration and inter-arrival time of commercial blocks with GMM (Gaussian Mixture Model), is applied to search the optimal path of commercial shots. Experiments get promising performance on a real TV video database, and show that distributions of duration and inter-arrival time of commercial blocks are good characteristics to capture global feature of commercial blocks.

Multimedia Systems | 2014

Multiple style exploration for story unit segmentation of broadcast news video

Bailan Feng; Zhineng Chen; Rong Zheng; Bo Xu

Broadcast news video has been playing an increasingly important role in our daily life. However, how to effectively segment a broadcast news video into meaningful semantic story units is still a challenge issue. In this paper, we propose a novel unified video structure parsing approach, named multiple style exploration-based news story segmentation (MSE-NSS), to segment broadcast news videos into semantic story units. In MSE-NSS, we first investigate the appropriate methods to explore multiple kinds of style information inherent in broadcast news videos, including temporal style inferred from caption texts, boundary style represented by a wealth of multi-modal visual–audio features, and structural style known as the spanning duration of story units. Then the above multiple style information is integrated together and the task of story unit segmentation is accomplished through the following three steps: temporal style-based pre-location, boundary style-based description, and boundary-structural style-based segmentation, where the segmentation process is composed of a SVM-based detector and a dynamic programming-based refiner that considers the boundary style and the structural style collectively. Parallel to this, a news-oriented broadcast management system—NOBMs is implemented on top of the proposed MSE-NSS. Encouraging experimental results on a large broadcast news video dataset demonstrate the effectiveness of the proposed MSE-NSS, as well as its superiority over traditional story unit segmentation methods.

international conference on acoustics, speech, and signal processing | 2012

Graph-based multi-modal scene detection for movie and teleplay

Su Xu; Bailan Feng; Peng Ding; Bo Xu

Automatic scene detection is a fundamental step for efficient video searching and browsing. This paper presents our current work on scene detection that integrates three effective strategies into a single framework. For each video, firstly, a coherence signal is constructed by graph modal obtained from the similarity matrix in a temporal interval. Secondly, the signal is optimized by scene transition graph (STG) analysis and audio classification, in which scene clues hidden in multimedia are discovered from the video. Finally, the scene boundaries are identified by window function. In experiments, we compare the proposed scene detection method with three typical algorithms on teleplay and movies, and the results of our method, yielding an average 0.85 F-measure, is the best one.

international conference on multimedia retrieval | 2015

Improving Automatic Name-Face Association using Celebrity Images on the Web

Zhineng Chen; Bailan Feng; Chong-Wah Ngo; Caiyan Jia; Xiangsheng Huang

This paper investigates the task of automatically associating faces appearing in images (or videos) with their names. Our novelty lies in the use of celebrity Web images to facilitate the task. Specifically, we first propose a method named Image Matching (IM), which uses the faces in images returned from name queries over an image search engine as the gallery set of the names, and a probe face is classified as one of the names, or none of them, according to their matching scores and compatibility characterized by a proposed Assigning-Thresholding (AT) pipeline. Noting IM could provide guidance for association for the well-established Graph-based Association (GA), we further propose two methods that jointly utilize the two kinds of complementary cues. They are: the early fusion of IM and GA (EF-IMGA) that takes the IM score as an additional information source to help the association in GA, and the late fusion of IM and GA (LF-IMGA) that combines the scores from both IM and GA obtained individually to make the association. Evaluations on datasets of captioned news images and Web videos both show the proposed methods, especially the two fused ones, provide significant improvements over GA.

Explore More