Is this you? Create Your Porfile

Toni M. Rath

University of Massachusetts Amherst

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Toni M. Rath is active.

Explore More

Publication

Featured researches published by Toni M. Rath.

First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings. | 2004

Holistic word recognition for handwritten historical documents

Victor Lavrenko; Toni M. Rath; R. Manmatha

Most offline handwriting recognition approaches proceed by segmenting words into smaller pieces (usually characters) which are recognized separately. The recognition result of a word is then the composition of the individually recognized parts. Inspired by results in cognitive psychology, researchers have begun to focus on holistic word recognition approaches. Here we present a holistic word recognition approach for single-author historical documents, which is motivated by the fact that for severely degraded documents a segmentation of words into characters will produce very poor results. The quality of the original documents does not allow us to recognize them with high accuracy - our goal here is to produce transcriptions that will allow successful retrieval of images, which has been shown to be feasible even in such noisy environments. We believe that this is the first systematic approach to recognizing words in historical manuscripts with extensive experiments. Our experiments show recognition accuracy of 65%, which exceeds performance of other systems which operate on non-degraded input images (nonhistorical documents).

international acm sigir conference on research and development in information retrieval | 2004

A search engine for historical manuscript images

Toni M. Rath; R. Manmatha; Victor Lavrenko

Many museum and library archives are digitizing their large collections of handwritten historical manuscripts to enable public access to them. These collections are only available in image formats and require expensive manual annotation work for access to them. Current handwriting recognizers have word error rates in excess of 50% and therefore cannot be used for such material. We describe two statistical models for retrieval in large collections of handwritten manuscripts given a text query. Both use a set of transcribed page images to learn a joint probability distribution between features computed from word images and their transcriptions. The models can then be used to retrieve unlabeled images of handwritten documents given a text query. We show experiments with a training set of 100 transcribed pages and a test set of 987 handwritten page images from the George Washington collection. Experiments show that the precision at 20 documents is about 0.4 to 0.5 depending on the model. To the best of our knowledge, this is the first automatic retrieval system for historical manuscripts using text queries, without manual transcription of the original corpus.

computer vision and pattern recognition | 2003

Using Corner Feature Correspondences to Rank Word Images by Similarity

Jamie L. Rothfeder; Shaolei Feng; Toni M. Rath

Libraries contain enormous amounts of handwritten historical documents which cannot be made available on-line because they do not have a searchable index. The wordspotting idea has previously been proposed as a solution to creating indexes for such documents and collections by matching word images. In this paper we present an algorithm which compares whole word-images based on their appearance. This algorithm recovers correspondences of points of interest in two images, and then uses these correspondences to construct a similarity measure. This similarity measure can then be used to rank word-images in order of their closeness to a querying image. We achieved an average precision of 62.57% on a set of 2372 images of reasonable quality and an average precision of 15.49% on a set of 3262 images from documents of poor quality that are even hard to read for humans.

intelligent data analysis | 2003

Automated Detection of Influenza Epidemics with Hidden Markov Models

Toni M. Rath; Maximo Carreras; Paola Sebastiani

We present a method for automated detection of influenza epidemics. The method uses Hidden Markov Models with an Exponential-Gaussian mixture to characterize the non-epidemic and epidemic dynamics in a time series of influenza-like illness incidence rates. Our evaluation on real data shows a reduction in the number of false detections compared to previous approaches and increased robustness to variations in the data.

international acm sigir conference on research and development in information retrieval | 2005

Boosted decision trees for word recognition in handwritten document retrieval

Nicholas R. Howe; Toni M. Rath; R. Manmatha

Recognition and retrieval of historical handwritten material is an unsolved problem. We propose a novel approach to recognizing and retrieving handwritten manuscripts, based upon word image classification as a key step. Decision trees with normalized pixels as features form the basis of a highly accurate AdaBoost classifier, trained on a corpus of word images that have been resized and sampled at a pyramid of resolutions. To stem problems from the highly skewed distribution of class frequencies, word classes with very few training samples are augmented with stochastically altered versions of the originals. This increases recognition performance substantially. On a standard corpus of 20 pages of handwritten material from the George Washington collection the recognition performance shows a substantial improvement in performance over previous published results (75% vs 65%). Following word recognition, retrieval is done using a language model over the recognized words. Retrieval performance also shows substantially improved results over previously published results on this database. Recognition/retrieval results on a more challenging database of 100 pages from the George Washington collection are also presented.

document analysis systems | 2006

Aligning transcripts to automatically segmented handwritten manuscripts

Jamie L. Rothfeder; R. Manmatha; Toni M. Rath

Training and evaluation of techniques for handwriting recognition and retrieval is a challenge given that it is difficult to create large ground-truthed datasets. This is especially true for historical handwritten datasets. In many instances the ground truth has to be created by manually transcribing each word, which is a very labor intensive process. Sometimes transcriptions are available for some manuscripts. These transcriptions were created for other purposes and hence correspondence at the word, line, or sentence level may not be available. To be useful for training and evaluation, a word level correspondence must be available between the segmented handwritten word images and the ASCII transcriptions. Creating this correspondence or alignment is challenging because the segmentation is often errorful and the ASCII transcription may also have errors in it. Very little work has been done on the alignment of handwritten data to transcripts. Here, a novel Hidden Markov Model based automatic alignment algorithm is described and tested. The algorithm produces an average alignment accuracy of about 72.8% when aligning whole pages at a time on a set of 70 pages of the George Washington collection. This outperforms a dynamic time warping alignment algorithm by about 12% previously reported in the literature and tested on the same collection.

conference on image and video retrieval | 2005

Learning shapes for image classification and retrieval

Natasha Mohanty; Toni M. Rath; Audrey Lee; R. Manmatha

Shape descriptors have been used frequently as features to characterize an image for classification and image retrieval tasks. For example, the patent office uses the similarity of shape to ensure that there are no infringements of copyrighted trademarks. This paper focuses on using machine learning and information retrieval techniques to classify an image into one of many classes based on shape. In particular, we compare Support Vector Machines, Naive Bayes and relevance language models for classification. Our results indicate that, on the MPEG-7 database, the relevance model outperforms the machine learning techniques and is competitive with prior work on shape based retrieval. We also show how the relevance model approach may be used to perform shape retrieval using keywords. Experiments on the MPEG-7 database and a binary version of the COIL-100 database show good retrieval performance.

computer vision and pattern recognition | 2003