Thomas A. Nartker | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thomas A. Nartker is active.

Explore More

Publication

Featured researches published by Thomas A. Nartker.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 1995

Automated evaluation of OCR zoning

Junichi Kanai; Stephen V. Rice; Thomas A. Nartker; George Nagy

Many current optical character recognition (OCR) systems attempt to decompose printed pages into a set of zones, each containing a single column of text, before converting the characters into coded form. The authors present a methodology for automatically assessing the accuracy of such decompositions, and demonstrate its use in evaluating six OCR systems. >

Algorithmica | 1997

Classes of cost functions for string edit distance

Stephen V. Rice; Horst Bunke; Thomas A. Nartker

Finding a sequence of edit operations that transforms one string of symbols into another with the minimum cost is a well-known problem. The minimum cost, or edit distance, is a widely used measure of the similarity of two strings. An important parameter of this problem is the cost function, which specifies the cost of each insertion, deletion, and substitution. We show that cost functions having the same ratio of the sum of the insertion and deletion costs divided by the substitution cost yield the same minimum cost sequences of edit operations. This leads to a partitioning of the universe of cost functions into equivalence classes. Also, we show the relationship between a particular set of cost functions and the longest common subsequence of the input strings.

international conference on document analysis and recognition | 1993

Performance metrics for document understanding systems

Junichi Kanai; Thomas A. Nartker; Stephen V. Rice; George Nagy

Requirements for the objective evaluation of automated data-entry systems are presented. Because the cost of correcting errors dominates the document conversion process, the most important characteristic of an OCR device is accuracy. However, different measures of accuracy (error metrics) are appropriate for different applications, and at the character, word, text-line, text-block, and document levels. For wholly objective assessment, OCR devices must be tested under programmed, rather than interactive, control.<<ETX>>

Archive | 1999

Optical Character Recognition

Stephen V. Rice; George Nagy; Thomas A. Nartker

This tutorial demonstrates how character recognition can be done with a backpropagation network and shows how to implement this using the Matlab Neural Network toolbox. This is a slightly modified version of the character recognition application of the Matlab Neural Network toolbox (chapter 11). Usage This tutorial is also available as printable PDF file. The matlab code for this tutorial is part of the Neural Network Toolbox which is installed at all PCs in the student PC rooms. To start the tutorial just type appcr1 at the matlab prompt. To get the offline HTML version of this tutorial and the matlab code as presented in the exercise course you have 1. to download the file nn-ocr.zip. 2. Unzip nn-ocr.zip which will generate a subdirectory named nn-ocr. 3. Add the path nn-ocr to the matlab search path with a command like addpath(’C:\Work\nn-ocr’) if you are using a Windows machine or addpath(’/home/jack/nn-ocr’) if you are using a Unix/Linux machine. 1 Optical Character Recognition It is often useful to have a machine perform pattern recognition. In particular, machines that can read symbols are very cost effective. A machine that reads banking checks can process many more checks than a human being in the same time. This kind of application saves time and money, and eliminates the requirement that a human perform such a repetitive task. http://www.igi.tugraz.at/lehre/CI/tutorials/nn-ocr.zip

international conference on information technology coding and computing | 2003

Ontology-based classification of email

Kazem Taghva; Julie Borsack; Jeffrey S. Coombs; Allen Condit; Steven E. Lumos; Thomas A. Nartker

We report on the construction of an ontology that applies rules for identification of features to be used for email classification. The associated probabilities for these features are then calculated from the training set of emails and used as a part of the feature vectors for an underlying Bayesian classifier.

document recognition and retrieval | 2005

Software tools and test data for research and testing of page-reading OCR systems

Thomas A. Nartker; Stephen V. Rice; Steven E. Lumos

We announce the availability of the UNLV/ISRI Analytic Tools for OCR Evaluation together with a large and diverse collection of scanned document images with the associated ground-truth text. This combination of tools and test data will allow anyone to conduct a meaningful test comparing the performance of competing page-reading algorithms. The value of this collection of software tools and test data is enhanced by knowledge of the past performance of several systems using exactly these tools and this data. These performance comparisons were published in previous ISRI Test Reports and are also provided. Another value is that the tools can be used to test the character accuracy of any page-reading OCR system for any language included in the Unicode standard. The paper concludes with a summary of the programs, test data, and documentation that is available and gives the URL where they can be located.

document recognition and retrieval | 2005

Address extraction using hidden Markov models

Kazem Taghva; Jeffrey S. Coombs; Ray Pereda; Thomas A. Nartker

This paper presents the implementation and evaluation of a Hidden Markov Model to extract addresses from OCR text. Although Hidden Markov Models discover addresses with high precision and recall, this type of Information Extraction task seems to be affected negatively by the presence of OCR text.

document recognition and retrieval | 2000

Evaluating text categorization in the presence of OCR errors

Kazem Taghva; Thomas A. Nartker; Julie Borsack; Steven E. Lumos; Allen Condit; Ron Young

In this paper we describe experiments that investigate the effects of OCR errors on text categorization. In particular, we show that in our environment, OCR errors have no effect on categorization when we use a classifier based on the naive Bayes model. We also observe that dimensionality reduction techniques eliminate a large number of OCR errors and improve categorization results.

conference on information and knowledge management | 2004

Information access in the presence of OCR errors

Kazem Taghva; Thomas A. Nartker; Julie Borsack

Over the last 15 years, the Information Science Research Institute (ISRI) at the University of Nevada, Las Vegas (UNLV) has conducted information access research in the presence of OCR errors. Our research has focused on issues associated with the construction of large document databases. In this paper, we will highlight our findings and detail our current activities.

document recognition and retrieval | 2003

The impact of running headers and footers on proximity searching

Kazem Taghva; Julie Borsack; Thomas A. Nartker; Jeffrey S. Coombs; Ron Young

Hundreds of experiments over the last decade on the retrieval of OCR documents performed by the Information Science Research Institute have shown that OCR errors do not significantly affect retrievability. We extend those results to show that in the case of proximity searching, the removal of running headers and footers from OCR text will not improve retrievability for such searches.

Explore More