Samir Malakar | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Samir Malakar is active.

Explore More

Publication

Featured researches published by Samir Malakar.

Journal of intelligent systems | 2011

Word Extraction and Character Segmentation from Text Lines of Unconstrained Handwritten Bangla Document Images

Ram Sarkar; Samir Malakar; Nibaran Das; Subhadip Basu; Mahantapas Kundu; Mita Nasipuri

Abstract In this paper, a novel approach for word extraction and character segmentation from the handwritten Bangla document images is reported. At first, a modified Run Length Smoothing Algorithm (RLSA), called Spiral Run Length Smearing Algorithm (SRLSA), is applied for the extraction of words from the text lines of unconstrained handwritten Bangla document images. This technique has helped to overcome some of the drawbacks of standard horizontal and vertical RLSA techniques. SRLSA technique has been applied on the Bangla handwritten document image database CMATERdb1.1.1 and the success rate of the word extraction is found to be 86.01%. In the second part of the work, we have presented a useful solution to the problem on how best word images of handwritten Bangla script can be segmented into constituent characters. Moreover, the technique can segment the words having discontinuity in Matra, a prominent feature of Bangla script. It also optimizes the trade-off between under/over segmentation as Matra region and segmentation points are estimated more precisely. As a result, better word segmentation accuracy is achieved with minimal data loss. Here, a success rate of 92.48% is observed on a dataset of 750 handwritten Bangla words which is 3.35% higher than that of our earlier techniques.

international conference on emerging applications of information technology | 2014

Handwritten Bangla Word Recognition Using HOG Descriptor

Showmik Bhowmik; Md. Galib Roushan; Ram Sarkar; Mita Nasipuri; Sanjib Polley; Samir Malakar

The holistic approaches for handwritten word recognition treat the words as single, indivisible entity and attempt to recognize words from their overall shape. In the present work, a novel technique to recognize handwritten Bangla word is proposed. Histograms of Oriented Gradients (HOG) are used as the feature set to represent each word sample at the feature space and a neural network based classifier is applied to classify the word images. On the basis of the HOG feature set, the performance achieved by the technique on a small dataset is quite satisfactory.

international conference on communications | 2012

Text line extraction from handwritten document pages using spiral run length smearing algorithm

Samir Malakar; Sougata Halder; Ram Sarkar; Nibaran Das; Subhadip Basu; Mita Nasipuri

Extraction of text lines from document images is one of the important steps in the process of an Optical Character Recognition (OCR) system. In case of handwritten document images, presence of skewed, touching or overlapping text line(s) makes this process a real challenge to the researcher. In the present work, a new text line extraction technique based on Spiral Run Length Smearing Algorithm (SRLSA) is reported. Firstly, digitized document image is partitioned into a number of vertical fragments of equal width. Then all the text line segments present in these fragments are identified by applying SRLSA. Finally, the neighboring text line segments are analyzed and merged (if necessary) to place them inside the same text line boundary in which they actually belong. For experimental purpose, the technique is tested on CMATERdb1.1.1 and CMATERdb1.2.1 databases. The present technique extracts 87.09% and 89.35% text lines successfully from the said databases respectively.

international conference on computational intelligence and communication networks | 2014

Handwritten Bangla Word Recognition Using Elliptical Features

Showmik Bhowmik; Samir Malakar; Ram Sarkar; Mita Nasipuri

In the present work, a holistic word recognition technique is proposed for the recognition of the handwritten Bangla words. Holistic word recognition technique assumes a word as a single and indivisible entity and extracts features from the entire word to recognize it. In this work, a set of elliptical features is extracted from handwritten word images to represent them in the feature space. Then, a comparison among 5 well known classifiers is carried out in terms of their accuracies to select the suitable classifier for evaluating the present work. Based on that, finally, a neural network based classifier is chosen for the recognition task. Using the elliptical features, the proposed system provides a satisfactory result on a small dataset.

International Journal of Computer Applications | 2010

A Script Independent Technique for Extraction of Characters from Handwritten Word Images

Ram Sarkar; Samir Malakar; Nibaran Das; Subhadip Basu; Mita Nasipuri

A script independent character segmentation from word images technique has been reported here. Word to character segmentation is an important preprocessing step of optical character recognition process. But in case of handwritten text, presence of touching characters decreases the accuracy of the technique of the segmentation of the characters from the word. In this paper, segmentation of handwritten word of four different scripts namely, Bangla, Devanagri, Gurmukhi and Syloti are considered as the test samples. All these scripts are characterized by the presence of a distinct line along the top of the most of the characters forming the words, called the headline or Matra. Unlike English script, the characters of these handwritten scripts and its components often encircle the main character, making the conventional segmentation methodologies inapplicable. For the segmentation technique two fuzzy features, to identify the Matra region and potential segmentation point, are used here. Experimental results, using the proposed segmentation technique, on sample of 400 handwritten word images containing all the above mentioned scripts of Bangla, Devanagri, Gurmukhi and Syloti show a success rate of 95.41%, 93.61%, 91.23% and 92.37% respectively.

FICTA (1) | 2017

Bangla Handwritten City Name Recognition Using Gradient-Based Feature

Shilpi Barua; Samir Malakar; Showmik Bhowmik; Ram Sarkar; Mita Nasipuri

In recent times, holistic word recognition has achieved enormous attention from the researchers due to its segmentation-free approach. In the present work, a holistic word recognition method is presented for the recognition of handwritten city names in Bangla script. At first, each word image is hypothetically segmented into equal number of grids. Then gradient-based features, inspired by Histogram of Oriented Gradients (HOG) feature descriptor, are extracted from each of the grids. For the selection of suitable classifier, five well-known classifiers are compared in terms of their recognition accuracies and finally the classifier Sequential Minimal Optimization (SMO) is chosen. The system has achieved 90.65% accuracy on 10,000 samples comprising of 20 most popular city names of West Bengal, a state of India.

international conference on computing communication and networking technologies | 2012

Text line extraction from handwritten document pages based on line contour estimation

Ram Sarkar; Sougata Halder; Samir Malakar; Nibaran Das; Subhadip Basu; Mita Nasipuri

Extraction of text lines from handwritten/printed document images is one of the important steps in the process of an Optical Character Recognition (OCR) system. In case of handwritten document images, presence of skewed, touching or overlapping text line(s) makes this process a real challenge to the researcher. In the present work, a new text line extraction technique based on line contour estimation is reported. Here, digitized document image is initially partitioned into a number of vertical fragments of equal width. Then all the line segments present in these vertical fragments are detected. Finally, the neighboring line segments are analyzed to place them inside the line boundary in which they actually belong. For experimental purpose, the developed technique is tested on CMATERdb1.2.1 database and present technique extracts 88.44% text lines successfully.

international conference on emerging applications of information technology | 2012

Two-stage skew correction of handwritten Bangla document images

Samir Malakar; Bhagesh Seraogi; Ram Sarkar; Nibaran Das; Subhadip Basu; Mita Nasipuri

Skewness in the handwritten document images is a common scenario. Therefore, it is very much required to detect and correct the skewness before the document is presented to the document image analysis system. In this regard, the present work develops a two-stage Hough transform based approach to remove the skewness in the document images written in Bangla script. Firstly, page-level skewness is removed by rotating the skewed text lines appropriately and then skewed words in each text line, if any, are also rotated along a reference line.

Archive | 2012

A Font Invariant Character Segmentation Technique for Printed Bangla Word Images

Ram Sarkar; Samir Malakar; Nibaran Das; Subhadip Basu; Mahantapas Kundu; Mita Nasipuri

A solution for segmentation of Bangla word images, printed in different fonts with varying styles and sizes, into constituent characters is reported here. Firstly, three horizontally non-intersecting zones viz., Upper, Middle and Lower Zones of a given word are identified. Then, estimation of the probable black pixels, which constitute common Matra of the word, a prominent feature in Bangla script, is done. Some of the black pixels on the Matra region are selected as potential segmentation points to segment the word vertically into their constituent characters. Each of these segmented components is then categorized into any of the six possible component types (viz. upper/middle/lower zone component/ middle and lower zone component/ broken character component/noise component). Middle and lower zone components are separated horizontally. The methodology is tested on 1600 word images of different fonts with varying styles and sizes and average success rate achieved is 96.85%.

international conference information processing | 2011

Binarization of the Noisy Document Images: A New Approach

Samir Malakar; Dheeraj Mohanta; Ram Sarkar; Nibaran Das; Mita Nasipuri; Dipak Kr. Basu

The work reported here, proposes a new methodology for determination of the threshold value to binarize noisy/noise free digitized document images. First, Middle of Modal Class (MMC) filtering [1] technique, one of our earlier works, is applied on the digitized document images for smoothing the noisy pixels. Then, from that information, we have identified two sets of gray-level values, one obviously representing objects and another obviously representing background. Rest of the gray-level values has been left for calculation of the threshold value. Then, we have determined the mean gray-level value of the all these pixels with gray-levels of the third set which will finally be used as threshold for binarization. A comparison of our results with iterative thresholding [2] and Otsus thresholding [2], [3] is done, and it is evident from the output images that present methodology provides a satisfactory result.

Explore More