Nibaran Das | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nibaran Das is active.

Explore More

Publication

Featured researches published by Nibaran Das.

Applied Soft Computing | 2012

A genetic algorithm based region sampling for selection of local features in handwritten digit recognition application

Nibaran Das; Ram Sarkar; Subhadip Basu; Mahantapas Kundu; Mita Nasipuri; Dipak Kumar Basu

Identification of local regions from where optimal discriminating features can be extracted is one of the major tasks in the area of pattern recognition. To locate such regions different kind of region sampling techniques are used in the literature. There is no standard methodology to identify exactly such regions. Here we have proposed a methodology where local regions of varying heights and widths are created dynamically. Genetic algorithm (GA) is then applied on these local regions to sample the optimal set of local regions from where an optimal feature set can be extracted that has the best discriminating features. We have evaluated the proposed methodology on a data set of handwritten Bangla digits. In the present work, we have randomly generated seven sets of local regions and from every set, GA selects an optimal group of local regions which produces best recognition performance with a support vector machine (SVM) based classifier. Other popular optimization techniques like simulated annealing (SA) and hill climbing (HC) have also been evaluated with the same data set and maximum recognition accuracies were found to be 97%, 96.7% and 96.7% for GA, SA and HC, respectively. We have also compared the performance of the present technique with those of other zone based techniques on the same database.

Pattern Recognition | 2009

A hierarchical approach to recognition of handwritten Bangla characters

Subhadip Basu; Nibaran Das; Ram Sarkar; Mahantapas Kundu; Mita Nasipuri; Dipak Kumar Basu

A novel hierarchical approach is presented here for optical character recognition (OCR) of handwritten Bangla words. Instead of dealing with isolated characters as found in selected works [T.K. Bhowmik, U. Bhattacharya, S.K. Parui, Recognition of Bangla handwritten characters using an MLP classifier based on stroke features, in: Proceedings of the ICONIP, Kolkata, India, 2004, pp. 814-819; K. Roy, U. Pal, F. Kimura, Bangla handwritten character recognition, in: Proceedings of the Second Indian International Conference on Artificial Intelligence (IICAI), 2005, pp. 431-443; S. Basu, N. Das, R. Sarkar, M. Kundu, M. Nasipuri, D.K. Basu, Handwritten Bangla alphabet recognition using an MLP based classifier, in: Proceedings of the Second National Conference on Computer Processing of Bangla, Dhaka, 2005, pp. 285-291; A.F.R. Rahman, R. Rahman, M.C. Fairhurst, Recognition of handwritten Bengali characters: a novel multistage approach, Pattern Recognition 35, 2002, pp. 997-1006; U. Bhattacharya, S.K. Parui, M. Sridhar, F. Kimura, Two-stage recognition of handwritten Bangla alphanumeric characters using neural classifiers, in: Proceedings of the Second Indian International Conference on Artificial Intelligence (IICAI), 2005, pp. 1357-1376; U. Bhattacharya, M. Sridhar, S.K. Parui, On recognition of handwritten Bangla characters, in: Proceedings of the ICVGIP-06, Lecture Notes in Computer Science, vol. 4338, 2006, pp. 817-828], the present approach segments a word image on Matra hierarchy, then recognizes the individual word segments and finally identifies the constituent characters of the word image through intelligent combination of recognition decisions of the associated word segments. Due to possible appearances of consecutive characters of Bangla words on overlapping character positions, segmentation of Bangla word images is not easy. For successful OCR of handwritten Bangla text, not only recognition but also segmentation of word images are important. In this respect the present hierarchical approach deals with both segmentation and recognition of handwritten Bangla word images for a complete solution to handwritten word recognition problem, an essential area of OCR of handwritten Bangla text. In dealing with certain category of word segments, created on Matra hierarchy, a sophisticated recognition technique, viz., two-pass approach [S. Basu, C. Chaudhury, M. Kundu, M. Nasipuri, D.K. Basu, A two pass approach to pattern classification, in: N.R. Pal et al. (Ed.), Lecture Notes in Computer Science, vol. 3316, ICONIP, Kolkata, 2004, pp. 781-786] is employed here. The degree of sophistication of the classification technique is also rationally tuned depending on various categories of word segments to be recognized. For example, the two-pass approach is employed here for recognizing middle zone character segments, whereas recognition of middle zone modified shapes of Bangla script is done through simple template matching. Considering learning and generalization abilities of multi layer perceptrons (MLPs), MLP based pattern classifiers are used here for most of the classification related tasks. A powerful feature set is also designed under this work for recognition of complex character patterns using three types of topological features, viz., longest-run features, modified shadow features and octant-centroid features. In a nutshell, the work deals with a practical problem of OCR of Bangla text involving recognition as well as segmentation of constituent characters of handwritten Bangla words.

Applied Soft Computing | 2012

A statistical-topological feature combination for recognition of handwritten numerals

Nibaran Das; Jagan Mohan Reddy; Ram Sarkar; Subhadip Basu; Mahantapas Kundu; Mita Nasipuri; Dipak Kumar Basu

Principal Component Analysis (PCA) and Modular PCA (MPCA) are well known statistical methods for recognition of facial images. But only PCA/MPCA is found to be insufficient to achieve high classification accuracy required for handwritten character recognition application. This is due to the shortcomings of those methods to represent certain local morphometric information present in the character patterns. On the other hand Quad-tree based hierarchically derived Longest-Run (QTLR) features, a type of popularly used topological features for character recognition, miss some global statistical information of the characters. In this paper, we have introduced a new combination of PCA/MPCA and QTLR features for OCR of handwritten numerals. The performance of the designed feature-combination is evaluated on handwritten numerals of five popular scripts of Indian sub-continent, viz., Arabic, Bangla, Devanagari, Latin and Telugu with Support Vector Machine (SVM) based classifier. From the results it has been observed that MPCA+QTLR feature combination outperforms PCA+QTLR feature combination and most other conventional features available in the literature.

Pattern Recognition | 2010

A novel framework for automatic sorting of postal documents with multi-script address blocks

Subhadip Basu; Nibaran Das; Ram Sarkar; Mahantapas Kundu; Mita Nasipuri; Dipak Kumar Basu

Recognition of numeric postal codes in a multi-script environment is a classical problem in any postal automation system. In such postal documents, determination of the script of the handwritten postal codes is crucial for subsequent invocation of the digit recognizers for respective scripts. The current framework attempts to infer about the script of the numeric postal code without having any bias from the script of the textual address part of the rest of the address block, as they might differ in a potential multi-script environment. Scope of the current work is to recognize the postal codes written in any of the four popular scripts, viz., Latin, Devanagari, Bangla and Urdu. For this purpose, we first implement a Hough transformation based technique to localize the postal-code blocks from structured postal documents with defined address block region. Isolated handwritten digit patterns are then extracted from the localized postal-code region. In the next stage of the developed framework, similar shaped digit patterns of the said four scripts are grouped in 25 clusters. A script independent unified pattern classifier is then designed to classify the numeric postal codes into one of these 25 clusters. Based on these classification decisions a rule-based script inference engine is designed to infer about the script of the numeric postal code. One of the four script specific classifiers is subsequently invoked to recognize the digit patterns of the corresponding script. A novel quad-tree based image partitioning technique is also developed in this work for effective feature extraction from the numeric digit patterns. The average recognition accuracy over ten-fold cross validation of results for the support vector machine (SVM) based 25-class unified pattern classifier is obtained as 92.03%. With randomly selected six-digit numeric strings of four different scripts; an average of 96.72% script inference accuracy is achieved. The average of tenfold cross-validation recognition accuracies of the individual SVM classifiers for the Latin, Devanagari, Bangla and Urdu numerals are observed as 95.55%, 95.63%, 97.15% and 96.20%, respectively.

international conference on computing theory and applications | 2007

A Fuzzy Technique for Segmentation of Handwritten Bangla Word Images

Subhadip Basu; Ram Sarkar; Nibaran Das; Mahantapas Kundu; Mita Nasipuri; Dipak Kumar Basu

A fuzzy technique for segmentation of handwritten Bangla word images is presented. It works in two steps. In first step, the black pixels constituting the Matra (i.e., the longest horizontal line joining the tops of individual characters of a Bangla word) in the target word image is identified by using a fuzzy feature. In second step, some of the black pixels on the Matra are identified as segment points (i.e., the points through which the word is to be segmented) by using three fuzzy features. On experimentation with a set of 210 samples of handwritten Bangla words, collected from different sources, the average success rate of the technique is shown to be 95.32%. Apart from certain limitations, the technique can be considered as a significant step towards the development of a full-fledged Bangla OCR system, especially for handwritten documents

pattern recognition and machine intelligence | 2005

Handwritten bangla digit recognition using classifier combination through DS technique

Subhadip Basu; Ram Sarkar; Nibaran Das; Mahantapas Kundu; Mita Nasipuri; Dipak Kumar Basu

The work presents an application of Dempster-Shafer (DS) technique for combination of classification decisions obtained from two Multi Layer Perceptron (MLP) based classifiers for optical character recognition (OCR) of handwritten Bangla digits using two different feature sets. Bangla is the second most popular script in the Indian subcontinent and the fifth most popular language in the world. The two feature sets used for the work are so designed that they can supply complementary information, at least to some extent, about the classes of digit patterns to the MLP classifiers. On experimentation with a database of 6000 samples, the technique is found to improve recognition performances by a minimum of 1.2% and a maximum of 2.32% compared to the average recognition rate of the individual MLP classifiers after 3-fold cross validation of results. The overall recognition rate as observed for the same is 95.1% on average.

pattern recognition and machine intelligence | 2009

Text Line Segmentation for Unconstrained Handwritten Document Images Using Neighborhood Connected Component Analysis

Abhishek Khandelwal; Pritha Choudhury; Ram Sarkar; Subhadip Basu; Mita Nasipuri; Nibaran Das

Text line extraction is the first and one of the most critical steps in optical character recognition (OCR) of unconstrained handwritten documents. The present work reports a new methodology based on comparison of neighborhood connected components to determine whether they belong to the same text line. Components which are very small or very large compared to the average component height are ignored in the preprocessing step. During post-processing, such components are reconsidered and allocated to the lines to which they most suitably belong. The performance of the developed technique is evaluated on the benchmark training dataset for the ICDAR 2009 handwriting segmentation contest. The dataset consists of English, French, German and Greek handwritten texts. The overall text line identification accuracy on the mentioned dataset is observed to be around 93.35%.

Pattern Recognition | 2016

A multi-objective approach towards cost effective isolated handwritten Bangla character and digit recognition

Ritesh Sarkhel; Nibaran Das; Amit Saha; Mita Nasipuri

Identifying the most informative local regions of a handwritten character image is necessary for a robust handwritten character recognition system. But identifying them from a character image is a difficult task. If this task were to be performed incurring minimum possible cost, it becomes more challenging due to having two independent, apparently contradicting objectives which need to be optimized simultaneously, i.e. maximizing the recognition accuracy and minimizing the associated recognition cost. To address the problem a multi-objective approach is required. In the present task, two popular multi-objective optimization Algorithm (1) a Non-Dominated Sorting Harmony-Search Algorithm (NSHA) and (2) a Non-Dominated Sorting Genetic Algorithm-II (NSGA-II, Deb et al., 2002 18) are employed for region sampling separately. The method objectively selects the most informative set of local regions using the framework of Axiomatic Fuzzy Set (AFS) theory, from the sets of pareto-optimal solutions provided by the multi-objective region sampling algorithms. The system has been evaluated on two isolated handwritten Bangla datasets, (1) a dataset of randomly mixed handwritten Bangla Basic and Compound characters and (2) a dataset of handwritten Bangla numerals separately, with SVM based classifier, using a feature set containing convex-hull based features and CG based quad-tree partitioned longest-run based local features extracted from the selected local regions. The results have shown a significant increase in recognition accuracy and decrease in recognition cost for all the datasets. Thus the present system introduces a cost effective approach towards isolated handwritten character recognition systems. Schematic representation of the integrated system developed under present work.Display Omitted Developed a cost effective approach towards handwritten character recognition system.A multi-objective region sampling methodology for isolated handwritten Bangla characters and digits recognition has been proposed.A non-dominated sorting harmony search algorithm based region sampling and a non-dominated sorting genetic algorithm based region sampling methodology have been developed.An AFS theory based fuzzy logic is utilized to develop a model for combining the pareto-optimal solutions from two multi-objective heuristics algorithms.Maximum recognition accuracies of 86.6478% and 98.23% have been achieved with 0.234% and 12.60% decrease in recognition cost for handwritten Bangla characters and digits respectively.

soft computing | 2014

Script identification from printed indian document images and performance evaluation using different classifiers

Sk Md Obaidullah; Anamika Mondal; Nibaran Das; Kaushik Roy

Identification of script from document images is an active area of research under document image processing for a multilingual/ multiscript country like India. In this paper the real life problem of printed script identification from official Indian document images is considered and performances of different well-known classifiers are evaluated. Two important evaluating parameters, namely, AAR (average accuracy rate) and MBT (model building time), are computed for this performance analysis. Experiment was carried out on 459 printed document images with 5-fold cross-validation. Simple Logistic model shows highest AAR of 98.9% among all. BayesNet and Random Forest model have average accuracy rate of 96.7% and 98.2% correspondingly with lowest MBT of 0.09 s.

Polibits | 2014

Comparison of Different Graph Distance Metrics for Semantic Text Based Classification

Nibaran Das; Swarnendu Ghosh; Teresa Gonçalves; Paulo Quaresma

Nowadays semantic information of text is used largely for text classification task instead of bag-of-words approaches. This is due to having some limitations of bag of word approaches to represent text appropriately for certain kind of documents. On the other hand, semantic information can be represented through feature vectors or graphs. Among them, graph is normally better than traditional feature vector due to its powerful data structure. However, very few methodologies exist in the literature for semantic representation of graph. Error tolerant graph matching techniques such as graph similarity measures can be utilised for text classification. However, the techniques like Maximum Common Subgraph (mcs) and Minimum Common Supergraph (MCS) for graph similarity measures are computationally NP-hard problem. In the present paper summarized texts are used during extraction of semantic information to make it computationally faster. The semantic information of texts are represented through the discourse representation structures and later transformed into graphs. Five different graph distance measures based on Maximum Common Subgraph (mcs) and Minimum Common Supergraph (MCS) are used with k-NN classifier to evaluate text classification task. The text documents are taken from Reuters21578 text database distributed over 20 classes. Ten documents of each class for both training and testing purpose are used in the present work. From the results, it has been observed that the techniques have more or less equivalent potential to do text classification and as good as traditional bag-of-words approaches.

Explore More