Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Peeta Basa Pati is active.

Publication


Featured researches published by Peeta Basa Pati.


Pattern Recognition Letters | 2008

Word level multi-script identification

Peeta Basa Pati; A. G. Ramakrishnan

We report an algorithm to identify the script of each word in a document image. We start with a bi-script scenario which is later extended to tri-script and then to eleven-script scenarios. A database of 20,000 words of different font styles and sizes has been collected and used for each script. Effectiveness of Gabor and discrete cosine transform (DCT) features has been independently evaluated using nearest neighbor, linear discriminant and support vector machines (SVM) classifiers. The combination of Gabor features with nearest neighbor or SVM classifier shows promising results; i.e., over 98% for bi-script and tri-script cases and above 89% for the eleven-script scenario.


Sadhana-academy Proceedings in Engineering Sciences | 2002

Script identification in printed bilingual documents

D. Dhanya; A. G. Ramakrishnan; Peeta Basa Pati

Identification of the script of the text in multi-script documents is one of the important steps in the design of an OCR system for the analysis and recognition of the page. Much work has already been reported in this area relating to Roman, Arabic, Chinese, Korean and Japanese scripts. In the Indian context, though some results have been reported, the task is still at its infancy. In the work presented in this paper, a successful attempt has been made to identify the script, at the word level, in a bilingual document containing Roman and Tamil scripts. Two different approaches have been proposed and thoroughly tested. In the first method, words are divided into three distinct spatial zones. The spatial spread of a word in upper and lower zones, together with the character density, is used to identify the script. The second technique analyses the directional energy distribution of a word using Gabor filters with suitable frequencies and orientations. Words with various font styles and sizes have been used for the testing of the proposed algorithms and the results are quite encouraging.


international conference on intelligent sensing and information processing | 2004

Gabor filters for document analysis in Indian bilingual documents

Peeta Basa Pati; S. Sabari Raju; Nishikanta Pati; A. G. Ramakrishnan

Reasonable success has been achieved at developing monolingual OCR systems in Indian scripts. Scientists, optimistically, have started to look beyond. Development of bilingual OCR systems and OCR systems with capability to identify the text areas are some of the pointers to future activities in Indian scenario. The separation of text and non-text regions before considering the document image for OCR is an important task. In this paper, we present a biologically inspired, multi-channel filtering scheme for page layout analysis. The same scheme has been used for script recognition as well. Parameter tuning is mostly done heuristically. It has also been seen to be computationally viable for commercial OCR system development.


document analysis systems | 2006

HVS inspired system for script identification in indian multi-script documents

Peeta Basa Pati; A. G. Ramakrishnan

Identification of the script of the text, present in multi-script documents, is one of the important first steps in the design of an OCR system. Much work has been reported relating to Roman, Arabic, Chinese, Korean and Japanese scripts. Though some work has already been reported involving Indian scripts, the work is still in its nascent stage. For example, most of the work assumes that the script changes only at the level of the line, which is rarely an acceptable assumption in the Indian scenario. In this work, we report a script identification algorithm, which takes into account the fact that the script changes at the word level in most Indian bilingual or multilingual documents. Initially, we deal with the identification of the script of words, using Gabor filters, in a bi-script scenario. Later, we extend this to tri-script and then, five-script scenarios. The combination of Gabor features with nearest neighbor classifier shows promising results. Words of different font styles and sizes are used. We have shown that our identification scheme, inspired from the Human Visual System (HVS), utilizing the same feature and classifier combination, works consistently well for any of the combination of scripts experimented.


international symposium on visual computing | 2005

Text localization and extraction from complex color images

S. Sabari Raju; Peeta Basa Pati; A. G. Ramakrishnan

Availability of mobile and hand-held imaging devices, such as, cell phones, PDA’s, still and video cameras have resulted in new applications, where the text present in the acquired images is extracted and interpreted for various purposes. In this paper, we present a new algorithm for automatic detection of text in color images. Proposed system involves Gabor function based multi-channel filtering on the intensity component of the image along with Graph-Theoretical clustering applied on the color space of the same image, there-by utilizing the advantages of texture analysis as well as those of connected component for text detection. Our approach performs well on images with complex background.


Iete Technical Review | 2005

OCR in Indian scripts: A survey

Peeta Basa Pati; A. G. Ramakrishnan

India is a multi-lingual country. A significantly large number of scripts are used to represent these languages. A desire of vision researchers is to develop an integrated Optical Character Recognition (OCR) system which will be able to process all such scripts. Such a development, if objectified, will not only enable faster flow of information across the country, but also have a profound impact on its scientific and economic development. Courageous endeavors have been successfully made towards the development of a system capable of recognizing machine-printed, or hand-written characters and/or numerals. However, most Indian scripts do not have an integrated OCR system. Further the development of a unified system which is capable of processing all Indian scripts is still a dream. This article presents a survey of the current literature on the development of OCRs in Indian scripts. Reviewing the basics of and the motivation towards the development of OCR system, the article analyzes the various methodologies employed in general purpose pattern recognition system. A critical analysis of the work towards OCR system in Indian languages, with pointers towards possible future work is also presented.


international conference on intelligent sensing and information processing | 2006

Automatic text block separation in document images

Kr Arvind; Peeta Basa Pati; A. G. Ramakrishnan

Separation of printed text blocks from the non-text areas, containing signatures, handwritten text, logos and other such symbols, is a necessary first step for an OCR involving printed text recognition. In the present work, we compare the efficacy of some feature-classifier combinations to carry out this separation task. We have selected length-normalized horizontal projection profile (HPP) as the starting point of such a separation task. This is with the assumption that the printed text blocks contain lines of text which generate HPPs with some regularity. Such an assumption is demonstrated to be valid. Our features are the HPP and its two transformed versions, namely, eigen and Fisher profiles. Four well known classifiers, namely, nearest neighbor, linear discriminant function, SVMs and artificial neural networks have been considered and efficiency of the combination of these classifiers with the above features is compared. A sequential floating feature selection technique has been adopted to enhance the efficiency of this separation task. The results give an average accuracy of about 96%.


international conference on document analysis and recognition | 2007

A Blind Indic Script Recognizer for Multi-script Documents

Peeta Basa Pati; A. G. Ramakrishnan

We report a hierarchical blind script identifier for 11 different Indian scripts. An initial grouping of the 11 scripts is accomplished at the first level of this hierarchy. At the subsequent level, we recognize the script in each group. The various nodes of this tree use different feature-classifier combinations. A database of 20,000 words of different font styles and sizes is collected and used for each script. Effectiveness of Gabor and Discrete Cosine Transform features has been independently evaluated using nearest neighbor, linear discriminant and support vector machine classifiers. The minimum and maximum accuracies obtained, using this hierarchical mechanism, are 92.2% and 97.6%, respectively.


international conference on computing theory and applications | 2007

Automatic Seal Information Reader

Farshad Nourbakhsh; Peeta Basa Pati; A. G. Ramakrishnan

Seals contain a lot of vital information about the document. Seal detection is a necessary step to gain access to that information. In the present paper, we present a correlation based technique which exploits the existence of some constant character strings and their topology for detection of seals. We also present a technique which separates the false positives from the actual seals. We achieved an accuracy of about 98% for extraction of such kind of seals. Once the seal is extracted from a page image, an OCR has been employed to read the contextual information present in the seal. A knowledge based post-processing step is employed to enhance the accuracy of the recognized text strings


international conference on intelligent sensing and information processing | 2006

Document Page Layout Analysis Using Harris Corner Points

Farshad Nourbakhsh; Peeta Basa Pati; A. G. Ramakrishnan

Extraction of text areas from the document images with complex content and layout is one of the challenging tasks. Few texture based techniques have already been proposed for extraction of such text blocks. Most of such techniques are greedy for computation time and hence are far from being realizable for real time implementation. In this work, we propose a modification to two of the existing texture based techniques to reduce the computation. This is accomplished with Harris corner detectors. The efficiency of these two textures based algorithms, one based on Gabor filters and other on log-polar wavelet signature, are compared. A combination of Gabor feature based texture classification performed on a smaller set of Harris corner detected points is observed to deliver the accuracy and efficiency.

Collaboration


Dive into the Peeta Basa Pati's collaboration.

Top Co-Authors

Avatar

A. G. Ramakrishnan

Indian Institute of Science

View shared research outputs
Top Co-Authors

Avatar

Farshad Nourbakhsh

Indian Institute of Science

View shared research outputs
Top Co-Authors

Avatar

S. Sabari Raju

Indian Institute of Science

View shared research outputs
Top Co-Authors

Avatar

Kr Arvind

Indian Institute of Science

View shared research outputs
Top Co-Authors

Avatar

Bhavna Antony

Indian Institute of Science

View shared research outputs
Top Co-Authors

Avatar

D. Dhanya

Indian Institute of Science

View shared research outputs
Top Co-Authors

Avatar

Nishikanta Pati

Indian Institute of Science

View shared research outputs
Top Co-Authors

Avatar

R.S. Umesh

Indian Institute of Science

View shared research outputs
Researchain Logo
Decentralizing Knowledge