Tapan Kumar Bhowmik
University of Groningen
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tapan Kumar Bhowmik.
International Journal on Document Analysis and Recognition | 2009
Tapan Kumar Bhowmik; Pradip Ghanty; Anandarup Roy; Swapan K. Parui
We propose support vector machine (SVM) based hierarchical classification schemes for recognition of handwritten Bangla characters. A comparative study is made among multilayer perceptron, radial basis function network and SVM classifier for this 45 class recognition problem. SVM classifier is found to outperform the other classifiers. A fusion scheme using the three classifiers is proposed which is marginally better than SVM classifier. It is observed that there are groups of characters having similar shapes. These groups are determined in two different ways on the basis of the confusion matrix obtained from SVM classifier. In the former, the groups are disjoint while they are overlapped in the latter. Another grouping scheme is proposed based on the confusion matrix obtained from neural gas algorithm. Groups are disjoint here. Three different two-stage hierarchical learning architectures (HLAs) are proposed using the three grouping schemes. An unknown character image is classified into a group in the first stage. The second stage recognizes the class within this group. Performances of the HLA schemes are found to be better than single stage classification schemes. The HLA scheme with overlapped groups outperforms the other two HLA schemes.
digital image computing: techniques and applications | 2005
Anandarup Roy; Tapan Kumar Bhowmik; Swapan K. Parui; Utpal Roy
Character segmentation is a necessary preprocessing step for character recognition in many handwritten word recognition systems. The most difficult case in character segmentation is the cursive script. Fully cursive nature of Bangla handwriting, the natural skewness in words poses some challenges for automatic character segmentation. In this article a novel approach to skew detection, correction as well as character segmentation has been presented for handwritten Bangla words as a test case. Segmenting points are extracted on the basis of some patterns observed in the handwritten words. With these segmenting points a graphical path (hereafter referred to as a candidate path) has been constructed. The handwritten words contain some consistent and also inconsistent skewness. Our algorithm can cope with both types of skewness at a time. Further the method is so direct that with the help of a candidate path one can handle both skew correction and segmentation successfully. the algorithm has been tested on a database prepared for laboratory use. The method yields fairly good results for this database.
international conference on pattern recognition | 2008
Tapan Kumar Bhowmik; Swapan K. Parui; Utpal Roy
This paper presents a recognition system for isolated handwritten Bangla words, with a fixed lexicon, using a left-right hidden Markov model (HMM). A stochastic search method, namely, genetic algorithm (GA) is used to train the HMM. A new shape based direction encoding features has been developed and introduced in our recognition system. Both non-discriminative and discriminative training procedures have been applied iteratively to optimize the parameters of HMM.
pattern recognition and machine intelligence | 2011
Ranjit Ghoshal; Anandarup Roy; Tapan Kumar Bhowmik; Swapan K. Parui
The goal of this article is to design an effective scheme for extraction of Bangla/Devnagari text from outdoor images. We first segment a color image using fuzzy c-means algorithm. In Bangla/Devnagari script, text may be attached/unattached to the headlines. Hence, after segmentation, headlines are detected from each connected components using morphology. Now, the components attached or close to the detected headlines are separated. Further by applying certain shape and position based purification we could distinguish text and non text. Our experiments on a dataset of 100 outdoor images containing Bangla and/or Devnagari text reveals satisfactory performance.
document analysis systems | 2012
Tapan Kumar Bhowmik; Utpal Roy; Swapan K. Parui
In this paper we introduce a stroke based lexicon reduction technique in order to reduce the search space for recognition of handwritten words. The principle of this technique involves mainly two aspects of a word image to constitute a feature vector: one is word-length and the other is shape of the word. The length of the word image is represented by the number of specific vertical strokes present in the word image and, on the other hand, the shape of a word image is realized with the combination of both horizontal and vertical strokes. The experiment has been carried out with a database of 35,700 off-line handwritten Bangla word images. Though our proposed lexicon reduction technique is developed for recognition of Bangla handwritten words, its generalization property can easily be exploited for recognition of handwriting in other scripts also.
pattern recognition and machine intelligence | 2011
Tapan Kumar Bhowmik; Jean-Paul van Oosten; Lambert Schomaker
This paper investigates the performance of hidden Markov models (HMMs) for handwriting recognition. The Segmental K-Means algorithm is used for updating the transition and observation probabilities, instead of the Baum-Welch algorithm. Observation probabilities are modelled as multi-variate Gaussian mixture distributions. A deterministic clustering technique is used to estimate the initial parameters of an HMM. Bayesian information criterion (BIC) is used to select the topology of the model. The wavelet transform is used to extract features from a grey-scale image, and avoids binarization of the image.
pattern recognition and machine intelligence | 2007
Tapan Kumar Bhowmik; Swapan K. Parui; Manika Kar; Utpal Roy
This paper presents a recognition system for isolated handwritten Bangla words, with a fixed lexicon, using a Hidden Markov Model (HMM). A stochastic search method, namely, Genetic Algorithm (GA) is used to train the HMM. The HMM is a left-right HMM. For feature extraction, the image boundary is traced both in the anticlockwise and clockwise directions and the significant changes in direction along the boundary are noted. Certain features defined on the basis of these changes are used in the proposed model.
document analysis systems | 2014
Tapan Kumar Bhowmik; Thierry Paquet; Nicolas Ragot
In this paper, we describe a novel and simple technique for prediction of OCR results without using any OCR. The technique uses a bag of allographs to characterize textual components. Then a support vector regression (SVR) technique is used to build a predictor based on the bag of allographs. The performance of the system is evaluated on a corpus of historical documents. The proposed technique produces correct prediction of OCR results on training and test documents within the range of standard deviation of 4.18% and 6.54% respectively. The proposed system has been designed as a tool to assist selection of corpora in libraries and specify the typical performance that can be expected on the selection.
international conference on neural information processing | 2011
Ranjit Ghoshal; Anandarup Roy; Tapan Kumar Bhowmik; Swapan K. Parui
This article proposes a scheme for automatic recognition of Bangla text extracted from outdoor scene images. For extraction, we obtain the headline, then apply certain conditions to distinguish between text and non-text. By removing the headline we partition the text into two zones. We further observe an association among the text symbols in these two different zones. For recognition purpose, we design a decision tree classifier with Multilayer Perceptron (MLP) at leaf nodes. The root node takes into account all possible text symbols. Further nodes highlight distinguishable features and act as two-class classifiers. Finally, at leaf nodes, a few text symbols remain, that are recognized using MLP classifiers. The association between the two zones makes recognition simpler and efficient. The classifiers are trained using about 7100 samples of 52 classes. Experiments are performed on 250 images (200 scene images and 50 scanned images).
acm transactions on asian and low resource language information processing | 2016
Tapan Kumar Bhowmik; Swapan K. Parui; Utpal Roy; Lambert Schomaker
In this article, we propose a new framework for segmentation of Bangla handwritten word images into meaningful individual symbols or pseudo-characters. Existing segmentation algorithms are not usually treated as a classification problem. However, in the present study, the segmentation algorithm is looked upon as a two-class supervised classification problem. The method employs an SVM classifier to select the segmentation points on the word image on the basis of various structural features. For training of the SVM classifier, an unannotated training set is prepared first using candidate segmenting points. The training set is then clustered, and each cluster is labeled manually with minimal manual intervention. A semi-automatic bootstrapping technique is also employed to enlarge the training set from new samples. The overall architecture describes a basic step toward building an annotation system for the segmentation problem, which has not so far been investigated. The experimental results show that our segmentation method is quite efficient in segmenting not only word images but also handwritten texts. As a part of this work, a database of Bangla handwritten word images has also been developed. Considering our data collection method and a statistical analysis of our lexicon set, we claim that the relevant characteristics of an ideal lexicon set are present in our handwritten word image database.