Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shivakumara Palaiahnakote is active.

Publication


Featured researches published by Shivakumara Palaiahnakote.


Archive | 2014

Video Text Detection

Tong Lu; Shivakumara Palaiahnakote; Chew Lim Tan; Wenyin Liu

This book presents a systematic introduction to the latest developments in video text detection. Opening with a discussion of the underlying theory and a brief history of video text detection, the text proceeds to cover pre-processing and post-processing techniques, character segmentation and recognition, identification of non-English scripts, techniques for multi-modal analysis and performance evaluation. The detection of text from both natural video scenes and artificially inserted captions is examined. Various applications of the technology are also reviewed, from license plate recognition and road navigation assistance, to sports analysis and video advertising systems. Features: explains the fundamental theory in a succinct manner, supplemented with references for further reading; highlights practical techniques to help the reader understand and develop their own video text detection systems and applications; serves as an easy-to-navigate reference, presenting the material in self-contained chapters.


Expert Systems With Applications | 2016

Modeling spatial layout for scene image understanding via a novel multiscale sum-product network

Zehuan Yuan; Hao Wang; Limin Wang; Tong Lu; Shivakumara Palaiahnakote; Chew Lim Tan

A new deep architecture MSPN is proposed for image segmentation.Multiscale unary potentials are used to model image spatial layouts.A superpixel-based refinement method is used to improve the parsing results. Semantic image segmentation is challenging due to the large intra-class variations and the complex spatial layouts inside natural scenes. This paper investigates this problem by designing a new deep architecture, called multiscale sum-product network (MSPN), which utilizes multiscale unary potentials as the inputs and models the spatial layouts of image content in a hierarchical manner. That is, the proposed MSPN models the joint distribution of multiscale unary potentials and object classes instead of single unary potentials in popular settings. Besides, MSPN characterizes scene spatial layouts in a fine-to-coarse manner to enforce the consistency in labeling. Multiscale unary potentials at different scales can thus help overcome semantic ambiguities caused by only evaluating single local regions, while long-range spatial correlations can further refine image labeling. In addition, higher orders are able to pose the constraints among labels. By this way, multi-scale unary potentials, long-range spatial correlations, higher-order priors are well modeled under the uniform framework in MSPN. We conduct experiments on two challenging benchmarks consisting of the MSRC-21 dataset and the SIFT FLOW dataset. The results demonstrate the superior performance of our method comparing with the previous graphical models for understanding scene images.


international symposium on intelligent signal processing and communication systems | 2014

Multi-oriented text detection for intra-frame in H.264/AVC video

Kazuki Minemura; Shivakumara Palaiahnakote; KokSheik Wong

Text detection in compressed video has received much attention in recent years due to the effectiveness of DCT coefficients and motion vectors in realizing several applications. In this paper, a new text detection, which utilizes AC coefficients in the H.264/AVC compressed video, is proposed. The proposed median deviation of coefficients from a specific subband is first computed, then the k-means clustering and morphological operations are applied to classify the text candidates. The majority orientation is considered to eliminate false positive candidate groups that have different orientations. Local block energy information is extracted to obtain the final text candidates. Experimental results show that the proposed method outperforms the existing methods either in computational time or accuracy in detecting horizontal text. Furthermore, for non-horizontal text, the proposed method is superior to all the conventional methods considered.


Archive | 2014

Introduction to Video Text Detection

Tong Lu; Shivakumara Palaiahnakote; Chew Lim Tan; Wenyin Liu

Text plays a dominant role in video viewing and understanding as text carries rich and important information relevant to the video contents. Studies have shown that humans often pay first attention to text over other objects in a video as text helps in getting semantics relevant to the content of the video. With this in mind, this chapter introduces research in video text detection. It first reviews relevant literature and then discusses characteristics and difficulties of video text detection faced by the majority of the methods under review. Various issues such as low resolution of video images, the presence of both caption and scene text in video, and background complexity variations are examined. This chapter also presents a brief historical overview to show how video text detection has evolved from the field of document image analysis and how the document analysis community has explored various methods proposed in different fields, including image processing, pattern recognition, computer vision, and artificial intelligence to find solution to text detection in video. Finally, this chapter discusses potential applications of video text detection.


Archive | 2014

Text Detection in Multimodal Video Analysis

Tong Lu; Shivakumara Palaiahnakote; Chew Lim Tan; Wenyin Liu

Most video streams involve more than one modality for conveying hints related to the nature of the underlying contents. In general, video data compose of three low-level modalities, namely, the visual modality (i.e., visual objects, motions, and scene changes), the auditory modality which can be structural foreground or unstructured background sounds in audio sources, and the textual modality such as natural video texts or man-made overlapped dialogues. The concurrent analysis of multimodal information modalities has thus potentially emerged as a more efficient way in automatic video content access especially in the recent years. This chapter introduces text detection in multimodal video analysis from a new view as follows. We first introduce the relevance of different modalities existing in video, namely, the auditory, the visual, and the textual modalities. General multimodal data fusion schemes for video analysis are discussed, and two examples for connecting video texts and other modalities are also given. Then we give a brief overview on the recent multimodal correlation models which integrate the video textual modality. Next, we discuss multimodal video applications such as text detection and OCR for person identification from broadcast videos, multimodal content-based structure analysis of karaoke, text detection for multimodal movie abstraction and retrieval, and web video classification through text modality. As a summary, text detection in multimodal video analysis is still the state-of-the-art problem but will become more important in the next decade.


Archive | 2014

Video Caption Detection

Tong Lu; Shivakumara Palaiahnakote; Chew Lim Tan; Wenyin Liu

Video contains two types of texts. The first type pertains to caption texts which are edited texts or graphics texts artificially superimposed into video and are relevant to the content of the video. The second type belongs to scene texts, which are naturally existing texts, usually embedded in objects in the video. This chapter focuses on the state-of-the-art methods developed for caption text detection in video. According to the literature, current methods can be classified into two broad categories, namely, feature-based methods and machine learning-based methods. Feature-based methods described in this chapter make use of the following features for text detection, namely, image edges by means of gradient and filters, textures by combining a variety of image textures, connected components by analyzing skeletons obtained from the image, and frequency domain features by performing Fourier transform. On the other hand, machine learning methods presented in this chapter make use of classifiers such as support vector machines, neural networks, and Bayesian classifiers.


Archive | 2014

Character Segmentation and Recognition

Tong Lu; Shivakumara Palaiahnakote; Chew Lim Tan; Wenyin Liu

This chapter presents methods for character segmentation from text lines and recognition of video characters. It is noted that character segmentation from video text lines detected by video text detection method is not as easy as segmenting characters from scanned document images due to low resolution and complex background of video. This chapter presents a method for word segmentation based on the combination of Fourier and moments. Then, the segmented words are used for character segmentation using top and bottom profile features of the words. This chapter also presents a method which does not require words for character segmentation. Instead, it segments character from text lines directly by exploring gradient vector flow (GVF) for identifying the space between words. Further, this chapter introduces a recognition method without the use of an OCR engine. The method proposes structural features based on eight-directional sectors to facilitate character recognition y calculating representatives for each class of the characters.


PLOS ONE | 2018

Residual-based approach for authenticating pattern of multi-style diacritical Arabic texts

Saqib Hakak; Amirrudin Kamsin; Shivakumara Palaiahnakote; Omar Tayan; Mohd Yamani Idna Idris; Khir Zuhaili Abukhir

Arabic script is highly sensitive to changes in meaning with respect to the accurate arrangement of diacritics and other related symbols. The most sensitive Arabic text available online is the Digital Qur’an, the sacred book of Revelation in Islam that all Muslims including non-Arabs recite as part of their worship. Due to the different characteristics of the Arabic letters like diacritics (punctuation symbols), kashida (extended letters) and other symbols, it is written and available in different styles like Kufi, Naskh, Thuluth, Uthmani, etc. As social media has become part of our daily life, posting downloaded Qur’anic verses from the web is common. This leads to the problem of authenticating the selected Qur’anic passages available in different styles. This paper presents a residual approach for authenticating Uthmani and plain Qur’an verses using one common database. Residual (difference) is obtained by analyzing the differences between Uthmani and plain Quranic styles using XOR operation. Based on predefined data, the proposed approach converts Uthmani text into plain text. Furthermore, we propose to use the Tuned BM algorithm (BMT) exact pattern matching algorithm to verify the substituted Uthmani verse with a given database of plain Qur’anic style. Experimental results show that the proposed approach is useful and effective in authenticating multi-style texts of the Qur’an with 87.1% accuracy.


Multimedia Tools and Applications | 2018

A scene image classification technique for a ubiquitous visual surveillance system

Maryam Asadzadeh Kaljahi; Shivakumara Palaiahnakote; Mohammad Hossein Anisi; Mohd Yamani Idna Idris; Michael Myer Blumenstein; Muhammad Khurram Khan

The concept of smart cities has quickly evolved to improve the quality of life and provide public safety. Smart cities mitigate harmful environmental impacts and offences and bring energy-efficiency, cost saving and mechanisms for better use of resources based on ubiquitous monitoring systems. However, existing visual ubiquitous monitoring systems have only been developed for a specific purpose. As a result, they cannot be used for different scenarios. To overcome this challenge, this paper presents a new ubiquitous visual surveillance mechanism based on classification of scene images. The proposed mechanism supports different applications including Soil, Flood, Air, Plant growth and Garbage monitoring. To classify the scene images of the monitoring systems, we introduce a new technique, which combines edge strength and sharpness to detect focused edge components for Canny and Sobel edges of the input images. For each focused edge component, a patch that merges nearest neighbor components in Canny and Sobel edge images is defined. For each patch, the contribution of the pixels in a cluster given by k-means clustering on edge strength and sharpness is estimated in terms of the percentage of pixels. The same percentage values are considered as a feature vector for classification with the help of a Support Vector Machine (SVM) classifier. Experimental results show that the proposed technique outperforms the state-of-the-art scene categorization methods. Our experimental results demonstrate that the SVM classifier performs better than rule and template-based methods.


pacific rim conference on multimedia | 2017

Cloud of Line Distribution for Arbitrary Text Detection in Scene/Video/License Plate Images

Wenhai Wang; Yirui Wu; Shivakumara Palaiahnakote; Tong Lu; Jun Liu

Detecting arbitrary oriented text in scene and license plate images is challenging due to multiple adverse factors caused by images of diversified applications. This paper proposes a novel idea of extracting Cloud of Line Distribution (COLD) for the text candidates given by Extremal regions (ER). The features extracted by COLD are fed to Random forest to label character components. The character components are grouped according to probability distribution of nearest neighbor components. This results in text line. The proposed method is demonstrated on standard database of natural scene images, namely ICDAR 2015, video images, namely ICDAR 2015 and license plate databases. Experimental results and comparative study show that the proposed method outperforms the existing methods in terms of invariant to rotations, scripts and applications.

Collaboration


Dive into the Shivakumara Palaiahnakote's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chew Lim Tan

National University of Singapore

View shared research outputs
Top Co-Authors

Avatar

Wenyin Liu

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mohd Yamani Idna Idris

Information Technology University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge