Trung Quy Phan
National University of Singapore
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Trung Quy Phan.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2011
Palaiahnakote Shivakumara; Trung Quy Phan; Chew Lim Tan
In this paper, we propose a method based on the Laplacian in the frequency domain for video text detection. Unlike many other approaches which assume that text is horizontally-oriented, our method is able to handle text of arbitrary orientation. The input image is first filtered with Fourier-Laplacian. K-means clustering is then used to identify candidate text regions based on the maximum difference. The skeleton of each connected component helps to separate the different text strings from each other. Finally, text string straightness and edge density are used for false positive elimination. Experimental results show that the proposed method is able to handle graphics text and scene text of both horizontal and nonhorizontal orientation.
international conference on document analysis and recognition | 2009
Trung Quy Phan; Palaiahnakote Shivakumara; Chew Lim Tan
In this paper, we propose an efficient text detection method based on the Laplacian operator. The maximum gradient difference value is computed for each pixel in the Laplacian-filtered image. K-means is then used to classify all the pixels into two clusters: text and non-text. For each candidate text region, the corresponding region in the Sobel edge map of the input image undergoes projection profile analysis to determine the boundary of the text blocks. Finally, we employ empirical rules to eliminate false positives based on geometrical properties. Experimental results show that the proposed method is able to detect text of different fonts, contrast and backgrounds. Moreover, it outperforms three existing methods in terms of detection and false positive rates.
Pattern Recognition | 2010
Palaiahnakote Shivakumara; Weihua Huang; Trung Quy Phan; Chew Lim Tan
Detection of both scene text and graphic text in video images is gaining popularity in the area of information retrieval for efficient indexing and understanding the video. In this paper, we explore a new idea of classifying low contrast and high contrast video images in order to detect accurate boundary of the text lines in video images. In this work, high contrast refers to sharpness while low contrast refers to dim intensity values in the video images. The method introduces heuristic rules based on combination of filters and edge analysis for the classification purpose. The heuristic rules are derived based on the fact that the number of Sobel edge components is more than the number of Canny edge components in the case of high contrast video images, and vice versa for low contrast video images. In order to demonstrate the use of this classification on video text detection, we implement a method based on Sobel edges and texture features for detecting text in video images. Experiments are conducted using video images containing both graphic text and scene text with different fonts, sizes, languages, backgrounds. The results show that the proposed method outperforms existing methods in terms of detection rate, false alarm rate, misdetection rate and inaccurate boundary rate.
international conference on document analysis and recognition | 2009
Palaiahnakote Shivakumara; Trung Quy Phan; Chew Lim Tan
In this paper, we propose a new method based on wavelet transform, statistical features and central moments for both graphics and scene text detection in video images. The method uses wavelet single level decomposition LH, HL and HH subbands for computing features and the computed features are fed to k means clustering to classify the text pixel from the background of the image. The average of wavelet subbands and the output of k means clustering helps in classifying true text pixel in the image. The text blocks are detected based on analysis of projection profiles. Finally, we introduce a few heuristics to eliminate false positives from the image. The robustness of the proposed method is tested by conducting experiments on a variety of images of low contrast, complex background, different fonts, and size of text in the image. The experimental results show that the proposed method outperforms the existing methods in terms of detection rate, false positive rate and misdetection rate.
IEEE Transactions on Circuits and Systems for Video Technology | 2010
Palaiahnakote Shivakumara; Trung Quy Phan; Chew Lim Tan
In this paper, we propose new Fourier-statistical features (FSF) in RGB space for detecting text in video frames of unconstrained background, different fonts, different scripts, and different font sizes. This paper consists of two parts namely automatic classification of text frames from a large database of text and non-text frames and FSF in RGB for text detection in the classified text frames. For text frame classification, we present novel features based on three visual cues, namely, sharpness in filter-edge maps, straightness of the edges, and proximity of the edges to identify a true text frame. For text detection in video frames, we present new Fourier transform based features in RGB space with statistical features and the computed FSF features from RGB bands are subject to K-means clustering to classify text pixels from the background of the frame. Text blocks of the classified text pixels are determined by analyzing the projection profiles. Finally, we introduce a few heuristics to eliminate false positives from the frame. The robustness of the proposed approach is tested by conducting experiments on a variety of frames of low contrast, complex background, different fonts, and sizes of text in the frame. Both our own test dataset and a publicly available dataset are used for the experiments. The experimental results show that the proposed approach is superior to existing approaches in terms of detection rate, false positive rate, and misdetection rate.
international conference on document analysis and recognition | 2009
Palaiahnakote Shivakumara; Trung Quy Phan; Chew Lim Tan
Text detection in video images has received increasing attention, particularly in scene text detection in video images, as it plays a vital role in video indexing and information retrieval. This paper proposes a new and robust gradient difference technique for detecting both graphics and scene text in video images. The technique introduces the concept of zero crossing to determine the bounding boxes for the detected text lines in video images, rather than using the conventional projection profiles based method which fails to fix bounding boxes when there is no proper spacing between the detected text lines. We demonstrate the capability of the proposed technique by conducting experiments on video images containing both graphics text and scene text with different font shapes and sizes, languages, text directions, background and contrasts. Our experimental results show that the proposed technique outperforms existing methods in terms of detection rate for large video image database.
international conference on computer vision | 2013
Trung Quy Phan; Palaiahnakote Shivakumara; Shangxuan Tian; Chew Lim Tan
This paper presents an approach to text recognition in natural scene images. Unlike most existing works which assume that texts are horizontal and frontal parallel to the image plane, our method is able to recognize perspective texts of arbitrary orientations. For individual character recognition, we adopt a bag-of-key points approach, in which Scale Invariant Feature Transform (SIFT) descriptors are extracted densely and quantized using a pre-trained vocabulary. Following [1, 2], the context information is utilized through lexicons. We formulate word recognition as finding the optimal alignment between the set of characters and the list of lexicon words. Furthermore, we introduce a new dataset called StreetViewText-Perspective, which contains texts in street images with a great variety of viewpoints. Experimental results on public datasets and the proposed dataset show that our method significantly outperforms the state-of-the-art on perspective texts of arbitrary orientations.
IEEE Transactions on Circuits and Systems for Video Technology | 2013
Palaiahnakote Shivakumara; Trung Quy Phan; Shijian Lu; Chew Lim Tan
Text detection in videos is challenging due to low resolution and complex background of videos. Besides, an arbitrary orientation of scene text lines in video makes the problem more complex and challenging. This paper presents a new method that extracts text lines of any orientations based on gradient vector flow (GVF) and neighbor component grouping. The GVF of edge pixels in the Sobel edge map of the input frame is explored to identify the dominant edge pixels which represent text components. The method extracts edge components corresponding to dominant pixels in the Sobel edge map, which we call text candidates (TC) of the text lines. We propose two grouping schemes. The first finds nearest neighbors based on geometrical properties of TC to group broken segments and neighboring characters which results in word patches. The end and junction points of skeleton of the word patches are considered to eliminate false positives, which output the candidate text components (CTC). The second is based on the direction and the size of the CTC to extract neighboring CTC and to restore missing CTC, which enables arbitrarily oriented text line detection in video frame. Experimental results on different datasets, including arbitrarily oriented text data, nonhorizontal and horizontal text data, Huas data and ICDAR-03 data (camera images), show that the proposed method outperforms existing methods in terms of recall, precision and f-measure.
acm multimedia | 2012
Trung Quy Phan; Palaiahnakote Shivakumara; Chew Lim Tan
The problem of text detection in natural scene images is challenging because of the unconstrained sizes, colors, backgrounds and alignments of the characters. This paper proposes novel symmetry features for this task. Within a text line, the intra-character symmetry captures the correspondence between the inner contour and the outer contour of a character while the inter-character symmetry helps to extract information from the gap region between two consecutive characters. A formulation based on Gradient Vector Flow is used to detect both types of symmetry points. These points are then grouped into text lines using the consistency in sizes, colors, and stroke and gap thickness. Therefore, unlike most existing methods which use only character features, our method exploits both the text features and the gap features to improve the detection result. Experimentally, our method compares well to the state-of-the-art on public datasets for natural scenes and street-level images, an emerging category of image data. The proposed technique can be used in a wide range of multimedia applications such as content-based image/video retrieval, mobile visual search and sign translation.
international conference on pattern recognition | 2010
Palaiahnaktoe Shivakumara; Trung Quy Phan; Chew Lim Tan
Automatic text detection in video is an important task for efficient and accurate indexing and retrieval of multimedia data such as events identification, events boundary identification etc. This paper presents a new method comprising of wavelet decomposition and color features namely R, G and B. The wavelet decomposition is applied on three color bands separately to obtain three high frequency sub bands (LH, HL and HH) and then the average of the three sub bands for each color band is computed further to enhance the text pixels in video frame. To take advantage of wavelet and color information, we again take the average of the three average images (AoA) obtained by the former step to increase the gap between text and non text pixels. Our previous Laplacian method is employed on AoA for text detection. The proposed method is evaluated by testing on a large dataset which includes publicly available data, non text data and ICDAR-03 data. Comparative study with existing methods shows that the results of the proposed method are encouraging and useful.