Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shu Tian is active.

Publication


Featured researches published by Shu Tian.


IEEE Transactions on Image Processing | 2016

Text Detection, Tracking and Recognition in Video: A Comprehensive Survey

Xu-Cheng Yin; Ze-Yu Zuo; Shu Tian; Cheng-Lin Liu

The intelligent analysis of video data is currently in wide demand because a video is a major source of sensory data in our lives. Text is a prominent and direct source of information in video, while the recent surveys of text detection and recognition in imagery focus mainly on text extraction from scene images. Here, this paper presents a comprehensive survey of text detection, tracking, and recognition in video with three major contributions. First, a generic framework is proposed for video text extraction that uniformly describes detection, tracking, recognition, and their relations and interactions. Second, within this framework, a variety of methods, systems, and evaluation protocols of video text extraction are summarized, compared, and analyzed. Existing text tracking techniques, tracking-based detection and recognition techniques are specifically highlighted. Third, related applications, prominent challenges, and future directions for video text extraction (especially from scene videos and web videos) are also thoroughly discussed.The intelligent analysis of video data is currently in wide demand because a video is a major source of sensory data in our lives. Text is a prominent and direct source of information in video, while the recent surveys of text detection and recognition in imagery focus mainly on text extraction from scene images. Here, this paper presents a comprehensive survey of text detection, tracking, and recognition in video with three major contributions. First, a generic framework is proposed for video text extraction that uniformly describes detection, tracking, recognition, and their relations and interactions. Second, within this framework, a variety of methods, systems, and evaluation protocols of video text extraction are summarized, compared, and analyzed. Existing text tracking techniques, tracking-based detection and recognition techniques are specifically highlighted. Third, related applications, prominent challenges, and future directions for video text extraction (especially from scene videos and web videos) are also thoroughly discussed.


international conference on document analysis and recognition | 2015

Multi-strategy tracking based text detection in scene videos

Ze-Yu Zuo; Shu Tian; Wei-Yi Pei; Xu-Cheng Yin

Text detection and tracking in scene videos are important prerequisites for content-based video analysis and retrieval, wearable camera systems and mobile devices augmented reality translators. Here, we present a novel multi-strategy tracking based text detection approach in scene videos. In this approach, a state-of-the-art scene text detection module [1] is first used to detect text in each video frame. Then a multi-strategy text tracking technique is proposed, which uses tracking by detection, spatio-temporal context learning, and linear prediction to predict the candidate text location sequentially, and adaptively integrates and selects the best matching text block from the candidate blocks with a rule-based method. This multi-strategy tracking technique can combine the advantages of the three different tracking techniques and afterwards make remedies to the disadvantages of them. Experiments on a variety of scene videos show that our proposed approach is effective and robust to reduce false alarm and improve the accuracy of detection.


international conference on neural information processing | 2012

Pedestrian analysis and counting system with videos

Zhi-Bin Wang; Hongwei Hao; Yan Li; Xu-Cheng Yin; Shu Tian

Reliable estimation of number of pedestrians has played an important role in the management of public places. However, how to accurately count pedestrians with abnormal behavior noises is one challenge in such surveillance systems. To deal with this problem, we propose a new and efficient framework for pedestrian analysis and counting, which consists of two main steps. Firstly, a rule induction classifier with optical-flow feature is designed to recognize the abnormal behaviors. Then, a linear regression model is used to learn the relationship between the number of pixels and the number of pedestrians. Consequently, our system can count pedestrians precisely in general scenes without the influence of abnormal behaviors. Experimental results on the videos of different scenes show that our system has achieved an accuracy of 98.59% and 96.04% for the abnormal behavior recognition and pedestrian counting respectively. Furthermore, it is robust against the variation of lighting and noise.


IEEE Transactions on Image Processing | 2017

Tracking Based Multi-Orientation Scene Text Detection: A Unified Framework With Dynamic Programming

Chun Yang; Xu-Cheng Yin; Wei-Yi Pei; Shu Tian; Ze-Yu Zuo; Chao Zhu; Junchi Yan

There are a variety of grand challenges for multi-orientation text detection in scene videos, where the typical issues include skew distortion, low contrast, and arbitrary motion. Most conventional video text detection methods using individual frames have limited performance. In this paper, we propose a novel tracking based multi-orientation scene text detection method using multiple frames within a unified framework via dynamic programming. First, a multi-information fusion-based multi-orientation text detection method in each frame is proposed to extensively locate possible character candidates and extract text regions with multiple channels and scales. Second, an optimal tracking trajectory is learned and linked globally over consecutive frames by dynamic programming to finally refine the detection results with all detection, recognition, and prediction information. Moreover, the effectiveness of our proposed system is evaluated with the state-of-the-art performances on several public data sets of multi-orientation scene text images and videos, including MSRA-TD500, USTB-SV1K, and ICDAR 2015 Scene Videos.


advances in multimedia | 2013

Weakly Supervised Compressive Tracking with Effective Prediction Model

Yin Liang; Xu-Cheng Yin; Shu Tian; Hongwei Hao

Recently, compressive sensing has been widely used in the field of object tracking. As a typical representative, Compressive Tracking (CT) outperforms the state-of-the-art approaches. But it has a drawback that the scale of object is invariable throughout the whole tracking process. For solving this problem and further improve its performance, in this paper we propose a Weakly Supervised Compressive Tracking (WSCT) approach. Firstly, we introduce an effective prediction model based on optical flow, which could achieve reliable estimation of the position and the scale of current object. Secondly, through the prediction model, the samples around the predicted position of the current frame could be further utilized as weakly supervised information to guide the compressive classifiers updating. In this way, this updating strategy combines the current information and future information to alleviate the phenomenon of object drift. Finally, the updated classifiers are used to locate the final object position. Our WSCT algorithm shows robustness for larger scale changes, especially in film and television videos. Experiments and comparisons on challenging tracking sequences have shown the effectiveness of our approach.


chinese conference on pattern recognition | 2016

Robust Segmentation for Video Captions with Complex Backgrounds

Zong-Heng Xing; Fang Zhou; Shu Tian; Xu-Cheng Yin

Caption text contains rich information that can be used for video indexing and summarization. In this paper, we propose an effective caption text segmentation approach to improve OCR accuracy. Here, an AlexNet CNN is first trained with path signature for text tracking. Then we utilize an improved adaptive thresholding method to segment caption text in individual frames. Finally, the multi-frame integration is conducted with gamma correction and region growing. In contrast to conventional methods which extract video text in individual frames independently, we exploit the specific temporal characteristics of videos to perform segmentation. Moreover, the proposed method can effectively remove the complex backgrounds with similar intensity to text. Experimental results on different videos and comparisons with other methods show the efficiency of our approach.


Computational and Mathematical Methods in Medicine | 2015

A Video-Based Intelligent Recognition and Decision System for the Phacoemulsification Cataract Surgery

Shu Tian; Xu-Cheng Yin; Zhi-Bin Wang; Fang Zhou; Hong-Wei Hao

The phacoemulsification surgery is one of the most advanced surgeries to treat cataract. However, the conventional surgeries are always with low automatic level of operation and over reliance on the ability of surgeons. Alternatively, one imaginative scene is to use video processing and pattern recognition technologies to automatically detect the cataract grade and intelligently control the release of the ultrasonic energy while operating. Unlike cataract grading in the diagnosis system with static images, complicated background, unexpected noise, and varied information are always introduced in dynamic videos of the surgery. Here we develop a Video-Based Intelligent Recognitionand Decision (VeBIRD) system, which breaks new ground by providing a generic framework for automatically tracking the operation process and classifying the cataract grade in microscope videos of the phacoemulsification cataract surgery. VeBIRD comprises a robust eye (iris) detector with randomized Hough transform to precisely locate the eye in the noise background, an effective probe tracker with Tracking-Learning-Detection to thereafter track the operation probe in the dynamic process, and an intelligent decider with discriminative learning to finally recognize the cataract grade in the complicated video. Experiments with a variety of real microscope videos of phacoemulsification verify VeBIRDs effectiveness.


international symposium on neural networks | 2013

Transfer learning based compressive tracking

Shu Tian; Xu-Cheng Yin; Xi Xu; Hongwei Hao

Although existing online tracking algorithms can solve the problems of scene illumination changes, partial or full object occlusions, and pose variation, there are still two weaknesses, inadequacy of training data and drift problem. Considering these, Compressive Tracking algorithm (CT) [1] extracts features from compressed domain, and classified object and background via a naive Bayes classier with online update. To further solve the problems of drift and inadequacy of training data, we introduce transfer learning into CT to take full advantage of prior information and propose a self-traininglike transfer learning algorithm. It selects training samples from samples collection to update classifier by the conduction of the classifier constructed at first frame. Eventually we introduce self-training-like transfer learning algorithm into CT to construct a novel tracking algorithm called Transfer Learning based Compressive Tracking (TLCT). Experimental results on 17 publicly available challenging sequences have shown the effectiveness and robustness of our algorithm.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2018

A Unified Framework for Tracking Based Text Detection and Recognition from Web Videos

Shu Tian; Xu-Cheng Yin; Ya Su; Hongwei Hao


international joint conference on artificial intelligence | 2016

Scene text detection in video by learning locally and globally

Shu Tian; Wei-Yi Pei; Ze-Yu Zuo; Xu-Cheng Yin

Collaboration


Dive into the Shu Tian's collaboration.

Top Co-Authors

Avatar

Xu-Cheng Yin

University of Science and Technology Beijing

View shared research outputs
Top Co-Authors

Avatar

Hongwei Hao

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Wei-Yi Pei

University of Science and Technology Beijing

View shared research outputs
Top Co-Authors

Avatar

Ze-Yu Zuo

University of Science and Technology Beijing

View shared research outputs
Top Co-Authors

Avatar

Chun Yang

University of Science and Technology Beijing

View shared research outputs
Top Co-Authors

Avatar

Fang Zhou

University of Science and Technology Beijing

View shared research outputs
Top Co-Authors

Avatar

Zhi-Bin Wang

University of Science and Technology Beijing

View shared research outputs
Top Co-Authors

Avatar

Chao Zhu

University of Science and Technology Beijing

View shared research outputs
Top Co-Authors

Avatar

Cheng-Lin Liu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Hong-Wei Hao

Chinese Academy of Sciences

View shared research outputs
Researchain Logo
Decentralizing Knowledge