Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Szilárd Vajda is active.

Publication


Featured researches published by Szilárd Vajda.


international conference on frontiers in handwriting recognition | 2004

A system towards Indian postal automation

Kaushik Roy; Szilárd Vajda; Umapada Pal; B. B. Chaudhuri

In this paper, we present a system towards Indian postal automation. In the proposed system, at first, using run length smoothing algorithm (RLSA), we decompose the image into blocks. Based on the black pixel density and number of components inside a block, non-text block (postal stamp, postal seal etc.) are detected. Using positional information, the destination address block (DAB) is identified from text block. Next, pin-code box from the DAB is detected and numerals from the pin-code box are extracted. Since India is a multi-lingual and multi-script country, the address part may be written by combination of two languages: Arabic and a local language. For the sorting of postal documents written in Arabic and a local language Bangla, a two-stage MLP based classifier is employed to recognise Bangla and Arabic numerals. At present, the accuracy of the handwritten numeral recognition module is 92.10%.


international conference on document analysis and recognition | 2005

A system for Indian postal automation

Kaushik Roy; Szilárd Vajda; Umapada Pal; B. B. Chaudhuri; Abdel Belaïd

In this paper, we present a system towards Indian postal automation based on the recognition of pin-code and city name of the postal document. In the proposed system, at first, non-text blocks (postal stamp, postal seal etc.) are detected and destination address block (DAB) is identified from the document. Next, lines and words of the DAB are segmented. Since India is a multi-lingual and multi-script country, the address part may be written by combination of two scripts. To identify the script by which a word is written, we propose a water reservoir based technique. It is very difficult to identify the script by which the pin-code portion is written. So, we have used two-stage artificial neural network (NN) based general classifiers for the recognition of pin-code digits written in English/Bangla. For recognition of city names, we propose an NSHP-HMM (non-symmetric half plane-hidden Markov model) based technique.


international conference on frontiers in handwriting recognition | 2010

Online Bangla Word Recognition Using Sub-Stroke Level Features and Hidden Markov Models

Gernot A. Fink; Szilárd Vajda; Ujjwal Bhattacharya; Swapan K. Parui; B. B. Chaudhuri

For automatic recognition of Bangla script, only a few studies are reported in the literature, which is in contrast to the role of Bangla as one of the worlds major scripts. In this paper we present a new approach to online Bangla handwriting recognition and one of the first to consider cursively written words instead of isolated characters. Our method uses a sub-stroke level feature representation of the script and a writing model based on hidden Markov models. As for the latter an appropriate internal structure is crucial, we investigate different approaches to defining model structures for a highly compositional script like Bangla. In experimental evaluations of a writer independent Bangla word recognition task we show that the use of context-dependent sub-word units achieves quite promising results and significantly outperforms alternatively structured models.


International Journal of Pattern Recognition and Artificial Intelligence | 2009

Automation of Indian Postal Documents written in Bangla and English

Szilárd Vajda; Kaushik Roy; Umapada Pal; B. B. Chaudhuri; Abdel Belaïd

In this paper, we present a system towards Indian postal automation based on pin-code and city name recognition. Here, at first, using Run Length Smoothing Approach (RLSA), non-text blocks (postal stamp, postal seal, etc.) are detected and using positional information, Destination Address Block (DAB) is identified from postal documents. Next, lines and words of the DAB are segmented. In India, the address part of a postal document may be written by a combination of two scripts: Latin (English) and a local (State/region) script. It is very difficult to identify the script by which pin-code part is written. To overcome this problem on pin-code part, we have used a two-stage artificial neural network based general scheme to recognize pin-code numbers written in any of the two scripts. To identify the script by which a word/city name is written, we propose a water reservoir concept based feature. For recognition of city names, we propose an NSHP-HMM (Non-Symmetric Half Plane-Hidden Markov Model) based technique. At present, the accuracy of the proposed digit numeral recognition module is 93.14% while that of city name recognition scheme is 86.44%.


international conference on document analysis and recognition | 2011

A Semi-supervised Ensemble Learning Approach for Character Labeling with Minimal Human Effort

Szilárd Vajda; Akmal Junaidi; Gernot A. Fink

One of the major issues in handwritten character recognition is the efficient creation of ground truth to train and test the different recognizers. The manual labeling of the data by a human expert is a tedious and costly procedure. In this paper we propose an efficient and low-cost semi-automatic labeling system for character datasets. First, the data is represented in different abstraction levels, which is clustered after in an unsupervised manner. The different clusters are labeled by the human experts and finally an unanimity voting is considered to decide if a label is accepted or not. The experimental results prove that labeling only less than 0.5% of the training data is sufficient to achieve 86.21% recognition rate for a brand new script (Lampung) and 94.81% for the MNIST benchmark dataset, considering only a K-nearest neighbor classifier for recognition.


international conference on frontiers in handwriting recognition | 2004

On the choice of training set, architecture and combination rule of multiple MLP classifiers for multiresolution recognition of handwritten characters

Ujjwal Bhattacharya; Szilárd Vajda; Anirban Mallick; B. B. Chaudhuri; Abdel Belaïd

A script independent recognition scheme for handwritten characters using multiple MLP classifiers and wavelet transform-based multiresolution pixel features is presented. We studied four different approaches for combination of multiple MLP classifiers and observed that a weighted majority voting approach provided the best recognition performance. Also, a thumb rule for the selection of network architecture has been obtained and a dynamic strategy for selection of training samples has been studied. The dynamic training set selection approach often makes the training procedure several times faster than the traditional training scheme. In our simulations, 98.04% recognition accuracy has been obtained on a test set of 5000 handwritten Bangla (an Indian script) numerals. Our approach is sufficiently fast for its real life applications and also script independent. The recognition performance of the present approach on the MNIST database for handwritten English digits is comparable to the state-of-the-art technologies.


Pattern Recognition | 2014

Semi-supervised learning for character recognition in historical archive documents

Jan Richarz; Szilárd Vajda; Rene Grzeszick; Gernot A. Fink

Training recognizers for handwritten characters is still a very time consuming task involving tremendous amounts of manual annotations by experts. In this paper we present semi-supervised labeling strategies that are able to considerably reduce the human effort. We propose two different methods to label and later recognize characters in collections of historical archive documents. The first one is based on clustering of different feature representations and the second one incorporates a simultaneous retrieval on different representations. Hence, both approaches are based on multi-view learning and later apply a voting procedure for reliably propagating annotations to unlabeled data. We evaluate our methods on the MNIST database of handwritten digits and introduce a realistic application in form of a database of handwritten historical weather reports. The experiments show that our method is able to significantly reduce the human effort that is required to build a character recognizer for the data collection considered while still achieving recognition rates that are close to a supervised classification experiment. HighlightsWe present semi-supervised labeling strategies that are able to considerably reduce the human effort.Two different methods to label and later recognize characters in collections of historical archive documents are proposed.A realistic application dealing with handwritten historical weather reports is introduced.Both methods are evaluated on the MNIST database of handwritten digits and the historical weather reports.


international conference on frontiers in handwriting recognition | 2012

Bag-of-Features Representations for Offline Handwriting Recognition Applied to Arabic Script

Leonard Rothacker; Szilárd Vajda; Gernot A. Fink

Due to the great variabilities in human writing, unconstrained handwriting recognition is still considered an open research topic. Recent trends in computer vision, however, suggest that there is still potential for better recognition by improving feature representations. In this paper we focus on feature learning by estimating and applying a statistical bag-of-features model. These models are successfully used in image categorization and retrieval. The novelty here is the integration with a Hidden Markov Model (HMM) that we use for recognition. Our method is evaluated on the IFN/ENIT database consisting of images of handwritten Arabic town and village names.


Pattern Recognition Letters | 2015

Semi-automatic ground truth generation using unsupervised clustering and limited manual labeling

Szilárd Vajda; Yves Rangoni; Hubert Cecotti

For training supervised classifiers to recognize different patterns, large data collections with accurate labels are necessary. In this paper, we propose a generic, semi-automatic labeling technique for large handwritten character collections. In order to speed up the creation of a large scale ground truth, the method combines unsupervised clustering and minimal expert knowledge. To exploit the potential discriminant complementarities across features, each character is projected into five different feature spaces. After clustering the images in each feature space, the human expert labels the cluster centers. Each data point inherits the label of its clusters center. A majority (or unanimity) vote decides the label of each character image. The amount of human involvement (labeling) is strictly controlled by the number of clusters - produced by the chosen clustering approach. To test the efficiency of the proposed approach, we have compared, and evaluated three state-of-the art clustering methods (k-means, self-organizing maps, and growing neural gas) on the MNIST digit data set, and a Lampung Indonesian character data set, respectively. Considering a k-nn classifier, we show that labeling manually only 1.3% (MNIST), and 3.2% (Lampung) of the training data, provides the same range of performance than a completely labeled data set would.


analytics for noisy unstructured text data | 2011

Lampung - a new handwritten character benchmark: database, labeling and recognition

Akmal Junaidi; Szilárd Vajda; Gernot A. Fink

This research paper deals with our effort of creation and recognition of isolated Lampung characters, a script originated from Indonesia. The aim is to describe this new script with all its peculiarities, propose a labeling scheme to manage a large isolated character dataset and finally a recognition scheme based on water reservoir concept. The Lampung script originally descending from Brahmi script is used in Lampung Province and it is close to extinction if no such initiative as ours will direct the focus to this cultural heritage. The collected dataset contains isolated characters coming from fairy tales transcriptions and were annotated with a semi-automatic labeling method using a limited human effort. Our attention is focused not only on the database collection but on recognition as well. For this purpose a water reservoir based feature set is proposed exploiting the different cavities and the subsequent measures of the character shapes. The experimental results (94.27%) prove the efficiency of the method considering a brand new script and feature set.

Collaboration


Dive into the Szilárd Vajda's collaboration.

Top Co-Authors

Avatar

Gernot A. Fink

Technical University of Dortmund

View shared research outputs
Top Co-Authors

Avatar

Sameer K. Antani

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

George R. Thoma

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

K. C. Santosh

University of South Dakota

View shared research outputs
Top Co-Authors

Avatar

B. B. Chaudhuri

Indian Statistical Institute

View shared research outputs
Top Co-Authors

Avatar

Eugene Borovikov

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Thomas Plötz

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Kaushik Roy

Indian Statistical Institute

View shared research outputs
Top Co-Authors

Avatar

Umapada Pal

Indian Statistical Institute

View shared research outputs
Researchain Logo
Decentralizing Knowledge