Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Xuejian Rong is active.

Publication


Featured researches published by Xuejian Rong.


international conference on multimedia and expo | 2014

Scene text recognition in multiple frames based on text tracking

Xuejian Rong; Chucai Yi; Xiaodong Yang; Yingli Tian

Text signage as visual indicators in natural scene plays an important role in navigation and notification in our daily life. Most previous methods of scene text extraction are developed from a single scene image. In this paper, we propose a multi-frame based scene text recognition method by tracking text regions in a video captured by a moving camera. The main contributions of this paper are as follows. First, we present a framework of scene text recognition in multiple frames based on feature representation of scene text character (STC) for character prediction and conditional random field (CRF) model for word configuration. Second, a feature representation of STC is employed from dense sampled SIFT descriptors and Fisher Vector. Third, we collect a dataset for text information extraction from natural scene videos. Our proposed multi-frame scene text recognition is more compatible with image/video-based mobile applications. The experimental results demonstrate that STC prediction and word configuration in multiple frames based on text tracking significantly improves the performance of scene text recognition.


IEEE Transactions on Circuits and Systems for Video Technology | 2017

Evaluation of Low-Level Features for Real-World Surveillance Event Detection

Yang Xian; Xuejian Rong; Xiaodong Yang; Yingli Tian

Event detection targets at recognizing and localizing specified spatio-temporal patterns in videos. Most research of human activity recognition in the past decades experimented on relatively clean scenes with limited actors performing explicit actions. Recently, more efforts have been paid to the real-world surveillance videos in which the human activity recognition is more challenging due to large variations caused by factors, such as scaling, resolution, viewpoint, cluttered background, and crowdedness. In this paper, we systematically evaluate seven different types of low-level spatio-temporal features in the context of surveillance event detection (SED) using a uniform experimental setup. Fisher vector is employed to aggregate low-level features as the representation of each video clip. A set of random forests is then learned as the classification models. To bridge the research efforts and real-world applications, we utilize the NIST TRECVID SED as our testbed in which seven events are predefined involving different levels of human activity analysis. Strengths and limitations for each low-level feature type are analyzed and discussed.


international conference on multimedia and expo | 2016

Depth-aware indoor staircase detection and recognition for the visually impaired

Rai Munoz; Xuejian Rong; Yingli Tian

A mobile vision-based navigation aid is capable of assisting the visually impaired to travel independently, especially in unfamiliar environments. Despite many effective navigation algorithms having been developed in recent decades, accurate, efficient, and reliable staircase detection in indoor navigation still remains to be a challenging problem. In this paper, we propose an effective indoor staircase detection algorithm based on an RGB-D camera. The candidates of staircases are first detected from RGB frames by extracting a set of concurrent parallel lines based on Hough transform. The complement depth frames are further employed to recognize the staircase candidates as upstairs, downstairs, and negatives (i.e., corridors). A support vector machine (SVM) based multi-classifier is trained and tested for the staircase recognition with our newly collected staircase dataset. The detection and recognition results demonstrate the effectiveness and efficiency of the proposed algorithm.


european conference on computer vision | 2016

ISANA: Wearable Context-Aware Indoor Assistive Navigation with Obstacle Avoidance for the Blind

Bing Li; J. Pablo Munoz; Xuejian Rong; Jizhong Xiao; Yingli Tian; Aries Arditi

This paper presents a novel mobile wearable context-aware indoor maps and navigation system with obstacle avoidance for the blind. The system includes an indoor map editor and an App on Tango devices with multiple modules. The indoor map editor parses spatial semantic information from a building architectural model, and represents it as a high-level semantic map to support context awareness. An obstacle avoidance module detects objects in front using a depth sensor. Based on the ego-motion tracking within the Tango, localization alignment on the semantic map, and obstacle detection, the system automatically generates a safe path to a desired destination. A speech-audio interface delivers user input, guidance and alert cues in real-time using a priority-based mechanism to reduce the user’s cognitive load. Field tests involving blindfolded and blind subjects demonstrate that the proposed prototype performs context-aware and safety indoor assistive navigation effectively.


european conference on computer vision | 2016

Recognizing Text-Based Traffic Guide Panels with Cascaded Localization Network

Xuejian Rong; Chucai Yi; Yingli Tian

In this paper, we introduce a new top-down framework for automatic localization and recognition of text-based traffic guide panels (http://tinyurl.com/wiki-guide-signs) captured by car-mounted cameras from natural scene images. The proposed framework involves two contributions. First, a novel Cascaded Localization Network (CLN) joining two customized convolutional nets is proposed to detect the guide panels and the scene text on them in a coarse-to-fine manner. In this network, the popular character-wise text saliency detection is replaced with string-wise text region detection, which avoids numerous bottom-up processing steps such as character clustering and segmentation. Text information contained within detected text regions is then interpreted by a deep recurrent model without character segmentation required. Second, a temporal fusion of text region proposals across consecutive frames is introduced to significantly reduce the redundant computation in neighboring frames. A new challenging Traffic Guide Panel dataset is collected to train and evaluate the proposed framework, instead of the unsuited symbol-based traffic sign datasets. Experimental results demonstrate that our proposed framework outperforms multiple recently published text spotting frameworks in real highway scenarios.


robotics and biomimetics | 2015

Assisting blind people to avoid obstacles: An wearable obstacle stereo feedback system based on 3D detection

Bing Li; Xiaochen Zhang; J. Pablo Munoz; Jizhong Xiao; Xuejian Rong; Yingli Tian

A wearable Obstacle Stereo Feedback (OSF) System for the Blind people based on 3D space obstacle detection is presented to assist the navigation. The OSF system embedded with a depth sensor to perceive the in-front 3D spatial information in the form of point clouds. We implemented the downsampling Random Sample Consensus (RANSAC) algorithm to process the perceived point cloud, and detect the obstacles in front of the user. Finally, Head-Related Transfer Functions (HRTF) are applied to create the virtual stereo sound which represents the obstacles according to its coordinate in the 3D space. The experiment shows that OSF system can detect the obstacle in the indoor environment effectively and provides a feasible auditory perception to indicate the in-front safety zone for the blind user.


computer vision and pattern recognition | 2017

Unambiguous Text Localization and Retrieval for Cluttered Scenes

Xuejian Rong; Chucai Yi; Yingli Tian

Text instance as one category of self-described objects provides valuable information for understanding and describing cluttered scenes. In this paper, we explore the task of unambiguous text localization and retrieval, to accurately localize a specific targeted text instance in a cluttered image given a natural language description that refers to it. To address this issue, first a novel recurrent Dense Text Localization Network (DTLN) is proposed to sequentially decode the intermediate convolutional representations of a cluttered scene image into a set of distinct text instance detections. Our approach avoids repeated detections at multiple scales of the same text instance by recurrently memorizing previous detections, and effectively tackles crowded text instances in close proximity. Second, we propose a Context Reasoning Text Retrieval (CRTR) model, which jointly encodes text instances and their context information through a recurrent network, and ranks localized text bounding boxes by a scoring function of context compatibility. Quantitative evaluations on standard scene text localization benchmarks and a newly collected scene text retrieval dataset demonstrate the effectiveness and advantages of our models for both scene text localization and retrieval.


international conference on multimedia retrieval | 2016

Region Trajectories for Video Semantic Concept Detection

Yuancheng Ye; Xuejian Rong; Xiaodong Yang; Yingli Tian

Recently, with the advent of the convolutional neural network (CNN), many CNN-based object detection algorithms have been proposed and achieved encouraging results. In this paper, we introduce an algorithm based on region trajectories to establish the connections between object localizations in individual frames and video sequences. To detect object regions in the individual frames of a video, we enhance the region-based convolutional neural network (R-CNN), by incorporating EdgeBox with the Selective Search to generate candidate region proposals and combining the GoogLeNet with the AlexNet to improve the discriminability of the feature representations. The DeepMatching algorithm is employed in our proposed region trajectory method to track the points in the detected object regions. The experiments are conducted on the validation split of the TRECVID 2015 Localization dataset. As demonstrated by the experimental results, our proposed approach improves the object detection accuracy in both temporal and spatial measurements.


international conference on digital signal processing | 2016

Adaptive shrinkage cascades for blind image deconvolution

Xuejian Rong; Yingli Tian

Recently emerged discriminative non-blind deconvolution methods achieve excellent performance with only a fraction of computation cost w.r.t. generative competitors, but their extension to blind deconvolution field was seldom addressed in a practical manner, albeit equally vital in image restoration area. We propose a novel framework for effective blind image deblurring by patch-wise prior based adaptive shrinkage cascades, which introduces the powerful internal patch-based image statistics to the non-blind shrinkage field formulations. The rich expressiveness of internal patch prior brings instance-specific adaptivity to alternating kernel refinement between neighboring shrinkage cascades, while shrinkage model trained from varieties of natural image collections benefits internal patch-wise prior inference with external information and superior efficiency.


international symposium on visual computing | 2016

Guided Text Spotting for Assistive Blind Navigation in Unfamiliar Indoor Environments

Xuejian Rong; Bing Li; J. Pablo Munoz; Jizhong Xiao; Aries Arditi; Yingli Tian

Scene text in indoor environments usually preserves and communicates important contextual information which can significantly enhance the independent travel of blind and visually impaired people. In this paper, we present an assistive text spotting navigation system based on an RGB-D mobile device for blind or severely visually impaired people. Specifically, a novel spatial-temporal text localization algorithm is proposed to localize and prune text regions, by integrating stroke-specific features with a subsequent text tracking process. The density of extracted text-specific feature points serves as an efficient text indicator to guide the user closer to text-likely regions for better recognition performance. Next, detected text regions are binarized and recognized by off-the-shelf optical character recognition methods. Significant non-text indicator signage can also be matched to provide additional environment information. Both recognized results are then transferred to speech feedback for user interaction. Our proposed video text localization approach is quantitatively evaluated on the ICDAR 2013 dataset, and the experimental results demonstrate the effectiveness of our proposed method.

Collaboration


Dive into the Xuejian Rong's collaboration.

Top Co-Authors

Avatar

Yingli Tian

City University of New York

View shared research outputs
Top Co-Authors

Avatar

Bing Li

City University of New York

View shared research outputs
Top Co-Authors

Avatar

Jizhong Xiao

City University of New York

View shared research outputs
Top Co-Authors

Avatar

Aries Arditi

Lighthouse International

View shared research outputs
Top Co-Authors

Avatar

J. Pablo Munoz

City University of New York

View shared research outputs
Top Co-Authors

Avatar

Xiaodong Yang

City University of New York

View shared research outputs
Top Co-Authors

Avatar

Chucai Yi

City University of New York

View shared research outputs
Top Co-Authors

Avatar

Yang Xian

City University of New York

View shared research outputs
Top Co-Authors

Avatar

Q. Chen

City University of New York

View shared research outputs
Top Co-Authors

Avatar

Rai Munoz

City College of New York

View shared research outputs
Researchain Logo
Decentralizing Knowledge