Sekhar Mandal
Indian Institute of Engineering Science and Technology, Shibpur
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sekhar Mandal.
International Journal on Document Analysis and Recognition | 2006
Sekhar Mandal; Shyama Prosad Chowdhury; Amit Kumar Das; Bhabatosh Chanda
The requirement of detection and identification of tables from document images is crucial to any document image analysis and digital library system. In this paper we report a very simple but extremely powerful approach to detect tables present in document pages. The algorithm relies on the observation that the tables have distinct columns which implies that gaps between the fields are substantially larger than the gaps between the words in text lines. This deceptively simple observation has led to the design of a simple but powerful table detection system with low computation cost. Moreover, mathematical foundation of the approach is also established including formation of a regular expression for ease of implementation.
international conference on document analysis and recognition | 2007
Shyama Prosad Chowdhury; Sekhar Mandal; Amit Kumar Das; Bhabatosh Chanda
Text, graphics and half-tones are the major constituents of any document page. While half-tone can be characterised by its inherent intensity variation, text and graphics share common characteristics except difference in spatial distribution. The success of document image analysis systems depends on the proper segmentation. The success of document image analysis systems depends on the proper segmentation of text and graphics as text is further subdivided into other classes such as heading, table and math-zones. Segmentation of graphics is essential for better OCR performance and vectorization in computer vision applications. Graphics segmentation from text is particularly difficult in the context of graphics made of small components (dashed or dotted lines etc.) which have many features similar to texts. Here we propose a robust technique for segmenting all sorts of graphics and texts in any orientation from document pages.
international conference on document analysis and recognition | 2003
Sekhar Mandal; Shyama Prosad Chowdhury; Amit Kumar Das; Bhabatosh Chanda
With an aim to extract the structural information from the table of contents (TOC) to help develop a digital document library, the requirement of identifying/segmenting the TOC page is obvious. The objective to create a digital document library is to provide a non-labour intensive, cheap and flexible way of storing, representing and managing the paper document in electronic form to facilitate indexing, viewing, printing and extracting the intended portions. Information from the TOC pages is to be extracted for use in a document database for effective retrieval of the required pages. We present a fully automatic identification and segmentation of a table of contents (TOC) page from a scanned document.
international conference on document analysis and recognition | 2003
Shyama Prosad Chowdhury; Sekhar Mandal; Amit Kumar Das; Bhabatosh Chanda
With an aim to high-level understanding of the mathematicalcontents in a document image the requirement ofmath-zone extraction and recognition technique is obvious.In this paper we present fully auotmatic segmentation ofdisplayed-math zones from the document image, using onlythe spatial layout information of math-formulas and equations,so as to help commercial OCR systems which cannotdiscern math-zones and also for the identification and arrangementof math symbols by others.
ieee international conference on image information processing | 2011
Sekhar Mandal; Sanjib Sur; Avishek Dan; Partha Bhowmick
A robust and efficient algorithm to recognize handwritten Bangla (Bengali) characters in machine-printed forms is proposed. It is based on the combination of gradient features and Haar wavelet coefficients. The gradient feature is used to capture local characteristics, and for its sensitivity to the usual deformation and idiosyncrasy of handwritten characters, wavelet transform is used for multi-resolution analysis of character images. Such a strategy with combined features captures adequate global characteristics in different scales. Two feature-combination schemes are devised and tested on test images of 4372 instances of 49 characters and 10 numerals, after being trained by a set of 59×25 = 1475 images. Finally, a k-NN classifier is used for the character recognition, which shows 87.65% and 88.95% recognition accuracies for the two schemes.
international conference on image analysis and processing | 2003
Sekhar Mandal; Shyama Prosad Chowdhury; Amit Kumar Das; Bhabatosh Chanda
The requirement of identifying and segmenting the table of contents (TOC) and index pages in the development of a digital library is obvious. A digital document library is created to provide a non-labour intensive, cheap and flexible way of storing, representing and managing paper documents in electronic form to facilitate indexing, viewing, printing and extracting the intended portions. Information from the TOC and index pages is extracted to use in a document database for effective retrieval of the required pieces of information. We present fully automatic identification and segmentation of TOC and index pages from a scanned document.
international conference on document analysis and recognition | 2005
Sekhar Mandal; S. P. Chowdhury; Amit Kumar Das; Bhabatosh Chanda
In this paper we propose a fully automatic hierarchical method for identification of forms using global as well as local features. Moments of certain orders are considered as global shape features and are utilised to reduce the search space by selecting a subset of forms present in the database. The type of the candidate form is then identified within this subset through detail analysis using local geometrical and topological features. The candidate form is then segmented to extract the user-filled information.
ieee international conference on image information processing | 2011
Paramita De; Sekhar Mandal; Partha Bhowmick
A robust and efficient algorithm for recognition of electrical symbols in digitized documents is proposed. The algorithm is based on morphological operations and geometric analysis to recognize different classes of symbols. Its novelty lies in building and usage of three spaces, namely H-, V -, and C-spaces, which respectively contain the horizontal line segments, the vertical line segments, and the circuit symbols present in the concerned drawing. These three spaces are built by morphological operations, which, in turn, are searched and scanned in a scientific way during the geometric analysis in order to obtain the recognized symbols by verifying the structural combination of their constituent primitives. Exhaustive experimentation has been performed to test the performance of the algorithm. Some of the results are presented in this paper to demonstrate its robustness, efficiency, and versatility.
ieee international conference on signal and image processing | 2014
Paramita De; Sekhar Mandal; Partha Bhowmick
Though annotations are integral part of symbols in drawings, due attention is yet to be given for their identification, interpretation, and storage. A reconstructed drawing from the vector format is thus deficient in the complete description of the original, and hence requires costly and time-consuming manual intervention. This paper presents a method for segmentation and identification of annotations associated with electrical symbols in a circuit diagram, which may be used with the vectorizer to make it complete. The proposed method first separates the text region around an intended circuit symbol, and then identifies the annotation part of the segmented text corresponding to that particular symbol. Morphological operations are used in identification phase. Finally, an efficient OCR is used to get the numerical values of the symbols along with their units. The performance of the algorithm is tested on a variety of images with ample variations in annotation. Some of the results are presented in this paper to demonstrate its efficiency.
Second International Conference on Document Image Analysis for Libraries (DIAL'06) | 2006
Sekhar Mandal; S. P. Chowdhury; Amit Kumar Das; Bhabatosh Chanda
Identification and segmentation of the table of contents (TOC) and index pages for the development of a digital library is an obvious task. A digital document library is created to provide a non-labour intensive, cheap and flexible way of storage, representation and management of paper documents in electronic form to facilitate indexing, viewing, printing and extracting the intended portions. Using document image analysis techniques information from the TOC and index pages may be extracted to use in a document database for effective retrieval of the required pieces of information. In this paper, we present fully automatic identification and segmentation of TOC and index pages from scanned documents