Yirui Wu
Nanjing University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yirui Wu.
acm multimedia | 2011
Limin Wang; Yirui Wu; Tong Lu; Kang Chen
In this paper, we present a novel approach for multiclass object detection by combining local appearances and contextual constraints. We first construct a multiclass Hough forest of local patches, which can well deal with multiclass object deformations and local appearance variations, due to randomization and discrimination of the forest. Then, in the object hypothesis space, a new multiclass context model is proposed to capture relative location constraints, disambiguating appearance inputs in multiclass object detection. Finally, multiclass objects are detected with a greedy search algorithm efficiently. Experimental evaluations on two image data sets show that the combination of local appearances and context achieves state-of-the-art performance in multiclass object detection.
IEEE Transactions on Image Processing | 2016
Yirui Wu; Palaiahnakote Shivakumara; Tong Lu; Chew Lim Tan; Michael Myer Blumenstein; Govindaraj Hemantha Kumar
Text recognition in video/natural scene images has gained significant attention in the field of image processing in many computer vision applications, which is much more challenging than recognition in plain background images. In this paper, we aim to restore complete character contours in video/scene images from gray values, in contrast to the conventional techniques that consider edge images/binary information as inputs for text detection and recognition. We explore and utilize the strengths of zero crossing points given by the Laplacian to identify stroke candidate pixels (SPC). For each SPC pair, we propose new symmetry features based on gradient magnitude and Fourier phase angles to identify probable stroke candidate pairs (PSCP). The same symmetry properties are proposed at the PSCP level to choose seed stroke candidate pairs (SSCP). Finally, an iterative algorithm is proposed for SSCP to restore complete character contours. Experimental results on benchmark databases, namely, the ICDAR family of video and natural scenes, Street View Data, and MSRA data sets, show that the proposed technique outperforms the existing techniques in terms of both quality measures and recognition rate. We also show that character contour restoration is effective for text detection in video and natural scene images.
International Journal on Document Analysis and Recognition | 2015
Yirui Wu; Palaiahnakote Shivakumara; Wang Wei; Tong Lu; Umapada Pal
Thinning that preserves visual topology of characters in video is challenging in the field of document analysis and video text analysis due to low resolution and complex background. This paper proposes to explore ring radius transform (RRT) to generate a radius map from Canny edges of each input image to obtain its medial axis. A radius value contained in the radius map here is the nearest distance to the edge pixels on contours. For the radius map, the method proposes a novel idea for identifying medial axis (middle pixels between two strokes) for arbitrary orientations of the character. Iterative-maximal-growing is then proposed to connect missing medial axis pixels at junctions and intersections. Next, we perform histogram on color information of medial axes with clustering to eliminate false medial axis segments. The method finally restores the shape of the character through radius values of medial axis pixels for the purpose of recognition with the Google Open source OCR (Tesseract). The method has been tested on video, natural scene and handwritten characters from ICDAR 2013, SVT, arbitrary-oriented data from MSRA-TD500, multi-script character data and MPEG7 object data to evaluate its performances at thinning level as well as recognition level. Experimental results comparing with the state-of-the-art methods show that the proposed method is generic and outperforms the existing methods in terms of obtaining skeleton, preserving visual topology and recognition rate. The method is also robust to handle characters of arbitrary orientations.
IEEE Transactions on Multimedia | 2017
Yirui Wu; Tong Lu; Zehuan Yuan; Hao Wang
Sculpture design is challenging due to its inherent difficulty in characterizing an artwork quantitatively, and few works have been done to assist sculpture design. We present a novel platform to help sculptors in two stages, comprising automatic sculpture reconstruction and free spectral-based sculpture pose editing. During sculpture reconstruction, we co-segment a sculpture from real scene images of different views through a two-label MRF framework, aiming at performing sculpture reconstruction efficiently. During sculpture pose editing, we automatically extract candidate editing points on the sculpture by searching in the spectrums of Laplacian operator. After manually mapping body joints of a sculptor to particular editing points, we further construct a global Laplacian-based linear system by adopting the spectrums of Laplacian operator and using Kinect captured body motions for real time pose editing. The constructed system thus allows the sculptor to freely edit different kinds of sculpture artworks through Kinect. Experimental results demonstrate that our platform successfully assists sculptors in real-time pose editing.
conference on multimedia modeling | 2018
Lianglei Wei; Yirui Wu; Wenhai Wang; Tong Lu
Understanding the meanings of human actions from 3D skeleton data embedded videos is a new challenge in content-oriented video analysis. In this paper, we propose to incorporate temporal patterns of joint positions with currently popular Long Short-Term Memory (LSTM) based learning to improve both accuracy and robustness. Regarding 3D actions are formed by sub-actions, we first propose Wavelet Temporal Pattern (WTP) to extract representations of temporal patterns for each sub-action by wavelet transform. Then, we define a novel Relation-aware LSTM (R-LSTM) structure to extract features by modeling the long-term spatio-temporal correlation between body parts. Regarding WTP and R-LSTM features as heterogeneous representations for human actions, we next fuse WTP and R-LSTM features by an Auto-Encoder network to define a more effective action descriptor for classification. The experimental results on a large scale challenging dataset NTU-RGB+D and several other datasets consisting of UT-Kinect and Florence 3D actions for 3D human action analysis demonstrate the effectiveness of the proposed method.
international joint conference on artificial intelligence | 2017
Zehuan Yuan; Tong Lu; Yirui Wu
We address the problem of object co-segmentation in images. Object co-segmentation aims to segment common objects in images and has promising applications in AI agents. We solve it by proposing a co-occurrence map, which measures how likely an image region belongs to an object and also appears in other images. The co-occurrence map of an image is calculated by combining two parts: objectness scores of image regions and similarity evidences from object proposals across images. We introduce a deep-dense conditional random field framework to infer co-occurrence maps. Both similarity metric and objectness measure are learned end-to-end in one single deep network. We evaluate our method on two datasets and achieve competitive performance.
conference on multimedia modeling | 2017
Yiyang Zhou; Wenhai Wang; Wenjie Guan; Yirui Wu; Heng Lai; Tong Lu; Min Cai
In this paper, we present a novel framework to drive automatic robotic grasp by matching camera captured RGB-D data with 3D meshes, on which prior knowledge for grasp is pre-defined for each object type. The proposed framework consists of two modules, namely, pre-defining grasping knowledge for each type of object shape on 3D meshes, and automatic robotic grasping by matching RGB-D data with pre-defined 3D meshes. In the first module, we scan 3D meshes for typical object shapes and pre-define grasping regions for each 3D shape surface, which will be considered as the prior knowledge for guiding automatic robotic grasp. In the second module, for each RGB-D image captured by a depth camera, we recognize 2D shape of the object in it by an SVM classifier, and then segment it from background using depth data. Next, we propose a new algorithm to match the segmented RGB-D shape with predefined 3D meshes to guide robotic self-location and grasp by an automatic way. Our experimental results show that the proposed framework is particularly useful to guide camera based robotic grasp.
pacific rim conference on multimedia | 2018
Zhaoyang Liu; Yirui Wu; Yukai Ding; Jun Feng; Tong Lu
To minimize damages brought by floods, researchers pay special attentions to solve the problem of flood prediction. Multiple factors, including rainfall, soil category, the structure of riverway and so on, affect the prediction of sequential flow rate values, but factors are not always informative for flood prediction. Extracting discriminative and informative features thus plays a key role in predicting flow rates. In this paper, we propose a context and temporal aware attention model for flood prediction based on a quantity of collected flood factors. We build our model on top of Long Short-Term Memory (LSTM) networks, which selectively focuses on informative factors and pays different levels of attentions to the outputs of different cells. The proposed CT-LSTM network assigns time-varying weights to input factors at all the cells of LSTM network, and allocates temporal-dependent weights to the outputs of each LSTM cell for boosting prediction performance. Experimental results on a benchmark flood dataset with several comparative methods demonstrate the effectiveness of the proposed CT-LSTM network for flood prediction.
conference on multimedia modeling | 2018
Wenhai Wang; Yirui Wu; Palaiahnakote Shivakumara; Tong Lu
Text detection in natural and video scene images is still considered to be challenging due to unpredictable nature of scene texts. This paper presents a new method based on Cloud of Line Distribution (COLD) and Random Forest Classifier for text detection in both natural and video images. The proposed method extracts unique shapes of text components by studying the relationship between dominant points such as straight or cursive over contours of text components, which is called COLD in polar domain. We consider edge components as text candidates if the edge components in Canny and Sobel of an input image share the COLD property. For each text candidate, we further study its COLD distribution at component level to extract statistical features and angle oriented features. Next, these features are fed to a random forest classifier to eliminate false text candidates, which results representatives. We then perform grouping using representatives to form text lines based on the distances between edge components in the edge image. The statistical and angle orientated features are finally extracted at word level for eliminating false positives, which results in text detection. The proposed method is tested on standard database, namely, SVT, ICDAR 2015 scene, ICDAR2013 scene and video databases, to show its effectiveness and usefulness compared with the existing methods.
pacific rim conference on multimedia | 2017
Wenhai Wang; Yirui Wu; Shivakumara Palaiahnakote; Tong Lu; Jun Liu
Detecting arbitrary oriented text in scene and license plate images is challenging due to multiple adverse factors caused by images of diversified applications. This paper proposes a novel idea of extracting Cloud of Line Distribution (COLD) for the text candidates given by Extremal regions (ER). The features extracted by COLD are fed to Random forest to label character components. The character components are grouped according to probability distribution of nearest neighbor components. This results in text line. The proposed method is demonstrated on standard database of natural scene images, namely ICDAR 2015, video images, namely ICDAR 2015 and license plate databases. Experimental results and comparative study show that the proposed method outperforms the existing methods in terms of invariant to rotations, scripts and applications.