Dharam Veer Sharma
Punjabi University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dharam Veer Sharma.
international conference on pattern recognition | 2006
Dharam Veer Sharma; Gurpreet Singh Lehal
Segmentation of handwritten text in Gurmukhi script is an uphill task primarily because of the structural features of the script and varied writing styles. The presence of a horizontal line connecting characters of a word (i.e. head line), half characters and overlapping of some vowel between middle and lower zone of a word make the task even more difficult. Handwritten text is also prone to the problem of overlapped, connected and merged characters with in a word. Structural features are helpful in segmentation of machine printed text but these are of little help for segmentation of handwritten words. The proposed technique segments the words in an iterative manner by focusing on presence of headline, aspect ratio of characters and vertical and horizontal projection profiles. The proposed approach of segmentation can be used for handwritten text of Indian language scripts like Devnagri, Bangla etc. having structural feature similar to Gurmukhi script
Proceedings of the International Workshop on Multilingual OCR | 2009
Dharam Veer Sharma; Gurpreet Singh Lehal; Preety Kathuria
The work presented in this paper focuses on the problem of extraction and recognition of digits (Roman as well as Gurmukhi) from Machine Printed Gurmukhi documents. The whole process consists of three stages. The first, segmentation stage takes as input an image of a document and separates the different logical parts, like lines of paragraph, words of a line and characters of a word. Then probable set of digits is extracted based on their features which makes them different from other Gurmukhi text. The next, Feature Extraction stage analyzes the set of probable digits and selects a set of structural and statistical features that can be used to uniquely identify the digits. The selection of a stable and representative set of features is the heart of digit recognition system. The final, classification stage is the main decision making stage of the system and uses the features extracted in the previous stage to identify the digit. We have used non parametric statistical classifier i.e. K-Nearest Neighbour for recognition purposes. The most promising recognition accuracy is achieved by using DDD features which is 95% for roman digits and 92.6% for Gurmukhi digits.
international conference on document analysis and recognition | 2009
Dharam Veer Sharma; Gurpreet Singh Lehal; Sarita Mehta
A post-processor is an integral part of any OCR system. This paper proposes a method for detection and correction of errors in recognition results of handwritten and machine printed Gurmukhi OCR. Based on the shape similarity of characters, the consonants of Gurmukhi Script are divided into different sets. Each set is given a unique number. In case of a recognition error, based on the shape of the consonants, corrections are made by taking each consonant of the subset into consideration. According to proposed algorithm, each recognized word is first encoded based on its consonants. The corresponding code is then searched in the dictionary. If it exits then words from the list of the code are match with the source word. In case of match the word is treated as correct else suggestions are made based on the similarity of the source word with the words of the same code present in dictionary. The method has been tested on the output of OCR of variety of machine printed and handwritten documents.
international conference on information systems | 2011
Dharam Veer Sharma; Puneet Jhajj
The present paper is a comparative study of different feature extraction techniques for recognition of isolated handwritten characters in Gurmukhi script. The whole process consists of three stages. The first, feature extraction stage, analyzes the set of isolated characters and select the set of features that can be used to uniquely identify characters. For the selection of stable and representative set of features of character under consideration in this problem Zoning, Directional Distance Distribution (DDD) and Gabor methods have been used. The second stage is classification stage which uses features extracted in the first stage to identify the character. For classification Support Vector Machine (SVM) has been used to identify the character. In the third stage, feature extraction methods have been compared with respect to recognition rate. An annotated sample image database of isolated handwritten characters in Gurmukhi script has been prepared which has been used for training and testing of the system. Gabor based feature extraction proved to be better as compared to others.
international conference on document analysis and recognition | 2009
Dharam Veer Sharma; Gurpreet Singh Lehal
Machine recognition of hand-filled forms is a challenging task. Form processing involves many activities including form field location, field frame boundary removal and data image extraction, segmentation, feature extraction, classification and recognition. The paper proposes an algorithm for removal of the field frame boundary of the hand filled forms in Gurmukhi Script. Because of the structural characteristics of the Gurmukhi script, use of headline and varied writing styles, the filled data may overlap or get merged with the field frame boundaries, which make the field data extraction task very challenging. It becomes particularly difficult to remove the field frame boundaries while preserving the filled in data. Experimental results reveal the efficiency of the proposed method in removing the field frame boundary and extracting the field data from form documents. Though, the algorithm has been developed and tested for Gurmukhi script but with minor or no changes it can be applied to scripts having structural features similar to that of Gurmukhi script, like Bangla and Devnagari.
International Journal of Computer Trends and Technology | 2016
Harmohan Sharma; Dharam Veer Sharma
OCR of Nastaleeq script has gained a lot of importance during recent past owing to the requirements of preserving historic manuscripts and making such manuscripts searchable besides other applications of OCR. Nastaleeq, being a complex script, has largely remained untouched for automation till now. Whatever little work has been done so far, it has proved insufficient to fulfil the needs. Developing OCR for Urdu script based languages becomes even more complex than other languages like Latin and Chinese due to complexities of Urdu scripts, i.e. cursive nature of writing Urdu, context sensitive shapes, overlapping between ligatures, use of joiners, formation of ligatures within the words and space between the ligatures. Moreover, this paper analyzes understanding of Urdu language, characteristics of Nastaleeq script and the complexities involved in developing the Urdu OCR.
international conference on information systems | 2011
Dharam Veer Sharma; Shilpi Wadhwa
During the scanning of bound documents, some part of the document image is curled near the corners or near the binding resulting in bending of text lines. This hard to tackle distortion makes recognition very difficult. A method has been proposed for estimation and removal of line bending deformations introduced in document images during the process of scanning. The estimation of bend involves determining the side of the document on which curl is present and direction of the bend. The method has been tested on varieties of printed document images of Gurmukhi containing the bent text-lines at page borders. The method consists of three stages. In the first stage, a decision methodology is proposed to locate the site of deformation and the direction of deformation. An elliptical approximation model is derived to estimate the amount of deformation in the second stage. Finally, a transformation process brings out the correction. Experiments show that the method developed works well under conditions where pixel distribution is uniform.
Archive | 2018
Dharam Veer Sharma; Harmohan Sharma
The major task in any pattern recognition application is collection of training data which is sufficient representative of the underlying patterns. In Urdu, this problem is aggravated due to the difficulty in the segmentation of words into characters. Due to the structure of Urdu scripts, Naskh or Nastaleeq, it becomes very difficult to segment a word into identifiable individual characters. Segmentation is only possible up to ligatures, where a ligature is a word or part of the word and consists of one or more characters. Corpora analysis has revealed that there are more than 25,000 ligatures which make the problem of image data collection and classification very gigantic. For the current work, the set of most commonly used ligatures, which comes out to be 1197 primary components and covers maximum ligatures used, has been considered. More than 850,000 ligatures have been isolated from scanned pages of 30 Urdu books. Manually classifying these ligature components into different classes for training is extremely time-consuming, monotonous, and error-prone task. The current work focuses on automatically separating isolated primary components of ligature images into separate classes based on their similarity.
international conference on information systems | 2011
Dharam Veer Sharma; Gurpreet Singh Lehal
A form processing system improves efficiency of data entry and analyses in offices using state-of-the-art technology. It typically consists of several sequential tasks or functional components viz. form designing, form template registration, field isolation, bounding box removal or colour dropout, field-image extraction, segmentation, feature-extraction from the field-image, field-recognition. The major challenges for a form processing system are large quantity of forms and large variety of writing styles of different individuals.
Proceedings of the International Workshop on Multilingual OCR | 2009
Dharam Veer Sharma; Gurpreet Singh Lehal
During scanning of documents the image may get skewed because of improper alignment of paper on the scanner, which results in wrong alignment of text on the document image. In some cases the image may even have double skew both at the page level and at word level due to curl near the binding of the book or in old typed/printed documents. Therefore skew detection and correction becomes an indispensable pre-processing task before the recognition of the text. In this paper we have proposed a robust technique for skew detection and correction of isolated words of machine printed Gurmukhi documents. The method presented here relies on the structural properties of words in Indic Script. The algorithm first identifies skewed word and then corrects the skewed words only. According to the proposed technique, isolated words having straight headline are not considered skewed but when length of headline is less than a threshold value then the word may be skewed and becomes target for correction. The algorithm can be equally effective for machine printed documents of those scripts where headline is used to connect characters of a word.