Swapan K. Parui | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Swapan K. Parui is active.

Explore More

Publication

Featured researches published by Swapan K. Parui.

ACM Transactions on Information Systems | 2007

YASS: Yet another suffix stripper

Prasenjit Majumder; Mandar Mitra; Swapan K. Parui; Gobinda Kole; Pabitra Mitra; Kalyankumar Datta

Stemmers attempt to reduce a word to its stem or root form and are used widely in information retrieval tasks to increase the recall rate. Most popular stemmers encode a large number of language-specific rules built over a length of time. Such stemmers with comprehensive rules are available only for a few languages. In the absence of extensive linguistic resources for certain languages, statistical language processing tools have been successfully used to improve the performance of IR systems. In this article, we describe a clustering-based approach to discover equivalence classes of root words and their morphological variants. A set of string distance measures are defined, and the lexicon for a given text collection is clustered using the distance measures to identify these equivalence classes. The proposed approach is compared with Porters and Lovins stemmers on the AP and WSJ subcollections of the Tipster dataset using 200 queries. Its performance is comparable to that of Porters and Lovins stemmers, both in terms of average precision and the total number of relevant documents retrieved. The proposed stemming algorithm also provides consistent improvements in retrieval performance for French and Bengali, which are currently resource-poor.

international conference on pattern recognition | 2008

Online handwritten Bangla character recognition using HMM

Swapan K. Parui; Koushik Guin; Ujjwal Bhattacharya; B. B. Chaudhuri

We describe here a novel scheme for recognition of online handwritten basic characters of Bangla, an Indian script used by more than 200 million people. There are 50 basic characters in Bangla and we have used a database of 24,500 online handwritten isolated character samples written by 70 persons. Samples in this database are composed of one or more strokes and we have collected all the strokes obtained from the training samples of the 50 character classes. These strokes are manually grouped into 54 classes based on the shape similarity of the graphemes that constitute the ideal character shapes. Strokes are recognized by using hidden Markov models (HMM). One HMM is constructed for each stroke class. A second stage of classification is used for recognition of characters using stroke classification results along with 50 look-up-tables (for 50 character classes).

Pattern Recognition | 1994

A robust parallel thinning algorithm for binary images

Amitava Datta; Swapan K. Parui

Abstract The class of multi-pass iterative thinning algorithms is considered. A new algorithm in the same class is proposed and is shown, on the basis of experimental results, to be superior to the existing ones with respect to medial axis representation and robustness. Some basic properties of thinning like 1 pixel thickness, preservation of connectivity for the present algorithm are ensured for which theoretical proofs are also given.

indian conference on computer vision, graphics and image processing | 2006

On recognition of handwritten bangla characters

Ujjwal Bhattacharya; Malayappan Shridhar; Swapan K. Parui

Recently, a few works on recognition of handwritten Bangla characters have been reported in the literature. However, there is scope for further research in this area. In the present article, results of our recent study on recognition of handwritten Bangla basic characters will be reported. This is a 50 class problem since the alphabet of Bangla has 50 basic characters. In this study, features are obtained by computing local chain code histograms of input character shape. Comparative recognition results are obtained between computation of the above feature based on the contour and one-pixel skeletal representations of the input character image. Also, the classification results are obtained after down sampling the histogram feature by applying Gaussian filter in both these cases. Multilayer perceptrons (MLP) trained by backpropagation (BP) algorithm are used as classifiers in the present study. Near exhaustive studies are done for selection of its hidden layer size. An analysis of the misclassified samples shows an interesting error pattern and this has been used for further improvement in the recognition results. Final recognition accuracies on the training and the test sets are respectively 94.65% and 92.14%.

Pattern Recognition | 1983

Symmetry analysis by computer

Swapan K. Parui; D. Dutta Majumder

Abstract Approximate locations of axes of symmetry of a 2-dimensional region are detected on the basis of its border. The border, described in terms of certain directional codes, is treated as a regular polygon in a hierarchical manner where a lower level means a greater number of sides. At each level of the hierarchy, the best axis of symmetry is found which for a lower level gives a more accurate position of the unknown axis of symmetry than for a higher level. Along with an axis of symmetry, a certain error is found on the basis of which the degree of symmetry of a 2-dimensional region is defined. Programs are written in FORTRAN IV and are implemented on an EC-1033 computer.

international conference on document analysis and recognition | 2007

Direction Code Based Features for Recognition of Online Handwritten Characters of Bangla

Ujjwal Bhattacharya; Bikash K. Gupta; Swapan K. Parui

In the present article, we describe a novel direction code based feature extraction approach for recognition of online Bangla handwritten basic characters. We have implemented the proposed approach on a database of 7043 online handwritten Bangla (a major script of the Indian subcontinent) character samples, which has been developed by us. This is a 50-class recognition problem and we achieved 93.90% and 83.61% recognition accuracies respectively on its training and test sets.

International Journal of Pattern Recognition and Artificial Intelligence | 2002

A HYBRID SCHEME FOR HANDPRINTED NUMERAL RECOGNITION BASED ON A SELF-ORGANIZING NETWORK AND MLP ClASSIFIERS

Ujjwal Bhattacharya; Tanmoy Kanti Das; Amitava Datta; Swapan K. Parui; B. B. Chaudhuri

This paper proposes a novel approach to automatic recognition of handprinted Bangla (an Indian script) numerals. A modified Topology Adaptive Self-Organizing Neural Network is proposed to extract a vector skeleton from a binary numeral image. Simple heuristics are considered to prune artifacts, if any, in such a skeletal shape. Certain topological and structural features like loops, junctions, positions of terminal nodes, etc. are used along with a hierarchical tree classifier to classify handwritten numerals into smaller subgroups. Multilayer perceptron (MLP) networks are then employed to uniquely classify the numerals belonging to each subgroup. The system is trained using a sample data set of 1800 numerals and we have obtained 93.26% correct recognition rate and 1.71% rejection on a separate test set of another 7760 samples. In addition, a validation set consisting of 1440 samples has been used to determine the termination of the training algorithm of the MLP networks. The proposed scheme is sufficiently robust with respect to considerable object noise.

International Journal on Document Analysis and Recognition | 2009

SVM-based hierarchical architectures for handwritten Bangla character recognition

Tapan Kumar Bhowmik; Pradip Ghanty; Anandarup Roy; Swapan K. Parui

We propose support vector machine (SVM) based hierarchical classification schemes for recognition of handwritten Bangla characters. A comparative study is made among multilayer perceptron, radial basis function network and SVM classifier for this 45 class recognition problem. SVM classifier is found to outperform the other classifiers. A fusion scheme using the three classifiers is proposed which is marginally better than SVM classifier. It is observed that there are groups of characters having similar shapes. These groups are determined in two different ways on the basis of the confusion matrix obtained from SVM classifier. In the former, the groups are disjoint while they are overlapped in the latter. Another grouping scheme is proposed based on the confusion matrix obtained from neural gas algorithm. Groups are disjoint here. Three different two-stage hierarchical learning architectures (HLAs) are proposed using the three grouping schemes. An unknown character image is classified into a group in the first stage. The second stage recognizes the class within this group. Performances of the HLA schemes are found to be better than single stage classification schemes. The HLA scheme with overlapped groups outperforms the other two HLA schemes.

international conference on document analysis and recognition | 2009

Devanagari and Bangla Text Extraction from Natural Scene Images

Ujjwal Bhattacharya; Swapan K. Parui; Srikanta Mondal

With the increasing popularity of digital cameras attached with various handheld devices, many new computational challenges have gained significance. One such problem is extraction of texts from natural scene images captured by such devices. The extracted text can be sent to OCR or to a text-to-speech engine for recognition. In this article, we propose a novel and effective scheme based on analysis of connected components for extraction of Devanagari and Bangla texts from camera captured scene images. A common unique feature of these two scripts is the presence of headline and the proposed scheme uses mathematical morphology operations for their extraction. Additionally, we consider a few criteria for robust filtering of text components from such scene images. Moreover, we studied the problem of binarization of such scene images and observed that there are situations when repeated binarization by a well-known global thresholding approach is effective. We tested our algorithm on a repository of 100 scene images containing texts of Devanagari and / or Bangla.

Computer Vision and Image Understanding | 1997

A novel approach to computation of the shape of a dot pattern and extraction of its perceptual border

A. Ray Chaudhuri; B. B. Chaudhuri; Swapan K. Parui

Abstract A novel approach to defining the external shape of a dot pattern is proposed from which the intuitive border of the set is extracted. The approach is based on a new definition called the s -shape, which can be generated by a data-driven procedure. The s -shape generates a staircase-like border. To obtain a polygonal border, an r -shape is defined for which the parameter r is found from s , the parameter of the s -shape. The main advantage of this approach is that it can be computed in O ( n ) time for a dot pattern containing n points. The approach has three basic steps: (i) choice of an appropriate s (and corresponding r ) from the given point set, (ii) generation of the r -shape, and (iii) cleaning of inconsistent parts from the r -shape. The diagram composed of the consistent edges of the r -shape is considered the perceived border of the dot pattern. A new structural basis called the dispersion matrix is evolved. Extension of the work to the digital case is discussed. The algorithm for extracting the perceptual border is fast since it is mainly composed of basic operations such as nonnegative integer addition and logical operations. Moreover, it can be implemented on parallel machines since the operations are local in the point space.

Explore More