Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Umapada Pal is active.

Publication


Featured researches published by Umapada Pal.


Pattern Recognition | 2004

Indian script character recognition : a survey

Umapada Pal; B. B. Chaudhuri

Abstract Intensive research has been done on optical character recognition (OCR) and a large number of articles have been published on this topic during the last few decades. Many commercial OCR systems are now available in the market. But most of these systems work for Roman, Chinese, Japanese and Arabic characters. There are no sufficient number of work on Indian language character recognition although there are 12 major scripts in India. In this paper, we present a review of the OCR work done on Indian language scripts. The review is organized into 5 sections. Sections 1 and 2 cover introduction and properties on Indian scripts. In Section 3, we discuss different methodologies in OCR development as well as research work done on Indian scripts recognition. In Section 4, we discuss the scope of future work and further steps needed for Indian script OCR development. In Section 5 we conclude the paper.


Pattern Recognition | 1998

A complete printed Bangla OCR system

B. B. Chaudhuri; Umapada Pal

A complete Optical Character Recognition (OCR) system for printed Bangla, the fourth most popular script in the world, is presented. This is the first OCR system among all script forms used in the Indian sub-continent. The problem is difficult because (i) there are about 300 basic, modified and compound character shapes in the script, (ii) the characters in a word are topologically connected and (iii) Bangla is an inflectional language. In our system the document image captured by Flat-bed scanner is subject to skew correction, text graphics separation, line segmentation, zone detection, word and character segmentation using some conventional and some newly developed techniques. From zonal information and shape characteristics, the basic, modified and compound characters are separated for the convenience of classification. The basic and modified characters which are about 75 in number and which occupy about 96% of the text corpus, are recognized by a structural-feature-based tree classifier. The compound characters are recognized by a tree classifier followed by template-matching approach. The feature detection is simple and robust where preprocessing like thinning and pruning are avoided. The character unigram statistics is used to make the tree classifier efficient. Several heuristics are also used to speed up the template matching approach. A dictionary-based error-correction scheme has been used where separate dictionaries are compiled for root word and suffixes that contain morpho-syntactic informations as well. For single font clear documents 95.50% word level (which is equivalent to 99.10% character level) recognition accuracy has been obtained. Extension of the work to Devnagari, the third most popular script in the world, is also discussed.


international conference on document analysis and recognition | 1997

An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi)

B. B. Chaudhuri; Umapada Pal

An OCR system is proposed that can read two Indian language scripts: Bangla and Devnagari (Hindi), the most popular ones in the Indian subcontinent. These scripts, having the same origin in ancient Brahmi script, have many features in common and hence a single system can be modeled to recognize them. In the proposed model, document digitization, skew detection, text line segmentation and zone separation, word and character segmentation, character grouping into basic, modifier and compound character category are done for both scripts by the same set of algorithms. The feature sets and classification tree as well as the knowledge base required for error correction (such as lexicon) differ for Bangla and Devnagari. The system shows a good performance for single font scripts printed on clear documents.


international conference on document analysis and recognition | 2009

ICDAR 2009 Handwriting Segmentation Contest

Nikolaos Stamatopoulos; Basilis Gatos; Georgios Louloudis; Umapada Pal; Alireza Alaei

This paper presents the results of the Handwriting Segmentation Contest that was organized in the context of the ICDAR2013. The general objective of the contest was to use well established evaluation practices and procedures to record recent advances in off-line handwriting segmentation. Two benchmarking datasets, one for text line and one for word segmentation, were created in order to test and compare all submitted algorithms as well as some state-of-the-art methods for handwritten document image segmentation in realistic circumstances. Handwritten document images were produced by many writers in two Latin based languages (English and Greek) and in one Indian language (Bangla, the second most popular language in India). These images were manually annotated in order to produce the ground truth which corresponds to the correct text line and word segmentation results. The datasets of previously organized contests (ICDAR2007, ICDAR2009 and ICFHR2010 Handwriting Segmentation Contests) along with a dataset of Bangla document images were used as training dataset. Eleven methods are submitted in this competition. A brief description of the submitted algorithms, the evaluation criteria and the segmentation results obtained from the submitted methods are also provided in this manuscript.


international conference on document analysis and recognition | 2003

Segmentation of Bangla unconstrained handwritten text

Umapada Pal; Sagarika Datta

To take care of variability involved in the writing style ofdifferent individuals in this paper we propose a robustscheme to segment unconstrained handwritten Banglatexts into lines, words and characters. For linesegmentation, at first, we divide the text into verticalstripes. Stripe width of a document is computed bystatistical analysis of the text height in the document.Next we determine horizontal histogram of these stripesand the relationship of the minimal values of thehistograms is used to segment text lines. Based onvertical projection profile lines are segmented intowords. Segmentation of characters from handwrittenword is very tricky as the characters are seldomvertically separable. We use a concept based on waterreservoir principle for the purpose. Here we, at first,identify isolated and connected (touching) characters ina word. Next touching characters of the word aresegmented based on the reservoir base area points andstructural feature of the component.


international conference on document analysis and recognition | 1999

Script line separation from Indian multi-script documents

Umapada Pal; B. B. Chaudhuri

In a multi-lingual country like India, a document page may contain more than one script form. Under the three-language formula, the document may be printed in English, Devnagari and one of the other official Indian languages. For OCR of such a document page, it is necessary to separate these three script forms before feeding them to the OCRs of individual scripts. In this paper, an automatic technique of separating the text lines using script characteristics and shape based features is presented. At present, the system has an overall accuracy of about 98.5%.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 1997

Skew angle detection of digitized Indian script documents

B. B. Chaudhuri; Umapada Pal

Skew angle detection of scanned documents containing most popular Indian scripts (Devnagari and Bangla) is considered. Most characters in these scripts have horizontal lines at the top, called head lines. The character head lines mostly join one another in a word and the word appears as a single component. In the proposed method the components are at first labeled. The upper envelope of a component is found by columnwise scanning from an imaginary line above the component. Portions of upper envelope satisfying the properties of digital straight line are detected. They are clustered as belonging to single text lines. Estimates from individual clusters are combined to get the skew angle. Apart from accuracy and efficiency, an advantage of the method is that character segmentation and zone detection can be readily done from head line information, which is useful in optical character recognition approaches of these scripts.


international conference on document analysis and recognition | 2003

Multi-script line identification from Indian documents

Umapada Pal; Suranjit Sinha; B. B. Chaudhuri

A document page may contain two or more different scripts.For Optical Character Recognition (OCR) of such adocument page, it is necessary to separate different scriptsbefore feeding them to their individual OCR system. In thispaper an automatic scheme is presented to identify text linesof different Indian scripts from a document. For theseparation task at first the scripts are grouped into a fewclasses according to script characteristics. Next featurebased on water reservoir principle, contour tracing, profileetc. are employed to identify them without any expensiveOCR-like algorithms. At present, the system has an overallaccuracy of about 97.52%.


Pattern Recognition Letters | 2003

Touching numeral segmentation using water reservoir concept

Umapada Pal; Abdel Belaïd; Christophe Choisy

This paper deals with a new technique for automatic segmentation of unconstrained handwritten connected numerals. To take care of variability involved in the writing style of different individuals a robust scheme is presented here. The scheme is mainly based on features obtained from a concept based on water reservoir. A reservoir is a metaphor to illustrate the region where numerals touch. Reservoir is obtained by considering accumulation of water poured from the top or from the bottom of the numerals. At first, considering reservoir location and size, touching position (top, middle or bottom) is decided. Next, analyzing the reservoir boundary, touching position and topological features of the touching pattern, the best cutting point is determined. Finally, combined with morphological structural features the cutting path for segmentation is generated. The proposed scheme is tested on French bank check data and an accuracy about 94.8% is obtained from the system.


international conference on document analysis and recognition | 2007

Handwritten Numeral Recognition of Six Popular Indian Scripts

Umapada Pal; Tetsushi Wakabayashi; Nabin Sharma; Fumitaka Kimura

India is a multi-lingual multi-script country but there is not much work towards handwritten character recognition of Indian languages. In this paper we propose a modified quadratic classifier based scheme towards the recognition of off-line handwritten numerals of six popular Indian scripts. Here we consider Devnagari, Bangla, Telugu, Oriya, Kannada and Tamil scripts for our experiment. The features used in the classifier are obtained from the directional information of the numerals. For feature computation, the bounding box of a numeral is segmented into blocks and the directional features are computed in each of the blocks. These blocks are then down sampled by a Gaussian filter and the features obtained from the down sampled blocks are fed to a modified quadratic classifier for recognition. Here we have used two sets of feature. We have used 64 dimensional features for high-speed recognition and 400 dimensional features for high-accuracy recognition in our proposed system. A five-fold cross validation technique has been used for result computation and we obtained 99.56%, 98.99%, 99.37%, 98.40%, 98.71% and 98.51% accuracy from Devnagari, Bangla, Telugu, Oriya, Kannada, and Tamil scripts, respectively.

Collaboration


Dive into the Umapada Pal's collaboration.

Top Co-Authors

Avatar

Partha Pratim Roy

Indian Institute of Technology Roorkee

View shared research outputs
Top Co-Authors

Avatar

Palaiahnakote Shivakumara

Information Technology University

View shared research outputs
Top Co-Authors

Avatar

Josep Lladós

Autonomous University of Barcelona

View shared research outputs
Top Co-Authors

Avatar

B. B. Chaudhuri

Indian Statistical Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sukalpa Chanda

Gjøvik University College

View shared research outputs
Top Co-Authors

Avatar

Chew Lim Tan

National University of Singapore

View shared research outputs
Top Co-Authors

Avatar

Anjan Dutta

Autonomous University of Barcelona

View shared research outputs
Researchain Logo
Decentralizing Knowledge