Daniel P. Lopresti | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniel P. Lopresti is active.

Explore More

Publication

Featured researches published by Daniel P. Lopresti.

IEEE Computer | 1991

Building and using a highly parallel programmable logic array

Maya Gokhale; William Holmes; Andrew Kopser; Sara Lucas; Ronald Minnich; Douglas Sweely; Daniel P. Lopresti

A two-slot addition called Splash, which enables a Sun workstation to outperform a Cray-2 on certain applications, is discussed. Following an overview of the Splash design and programming, hardware development is described. The development of the logic description generator is examined in detail. Splashs runtime environment is described, and an example application, that of sequence comparison, is given.<<ETX>>

IEEE Transactions on Pattern Analysis and Machine Intelligence | 1996

Validation of image defect models for optical character recognition

Yan-Hong Li; Daniel P. Lopresti; George Nagy; Andrew Tomkins

Considers the problem of evaluating character image generators that model distortions encountered in optical character recognition (OCR). While a number of such defect models have been proposed, the contention that they produce the desired result is typically argued in an ad hoc and informal way. The authors introduce a rigorous and more pragmatic definition of when a model is accurate: they say a defect model is validated if the OCR errors induced by the model are indistinguishable from the errors encountered when using real scanned documents. The authors describe four measures to quantify this similarity, and compare and contrast them using over ten million scanned and synthesized characters in three fonts. The measures differentiate effectively between different fonts and different scans of the same font regardless of the underlying text.

International Journal on Document Analysis and Recognition | 2006

Table-processing paradigms: a research survey

David W. Embley; Matthew Hurst; Daniel P. Lopresti; George Nagy

Tables are a ubiquitous form of communication. While everyone seems to know what a table is, a precise, analytical definition of “tabularity” remains elusive because some bureaucratic forms, multicolumn text layouts, and schematic drawings share many characteristics of tables. There are significant differences between typeset tables, electronic files designed for display of tables, and tables in symbolic form intended for information retrieval. Most past research has addressed the extraction of low-level geometric information from raster images of tables scanned from printed documents, although there is growing interest in the processing of tables in electronic form as well. Recent research on table composition and table analysis has improved our understanding of the distinction between the logical and physical structures of tables, and has led to improved formalisms for modeling tables. This review, which is structured in terms of generalized paradigms for table processing, indicates that progress on half-a-dozen specific research issues would open the door to using existing paper and electronic tables for database update, tabular browsing, structured information retrieval through graphical and audio interfaces, multimedia table editing, and platform-independent display.

Theoretical Computer Science | 1997

Block edit models for approximate string matching

Daniel P. Lopresti; Andrew Tomkins

In this paper we examine string block edit distance, in which two strings A and B are compared by extracting collections of substrings and placing them into correspondence. This model accounts for certain phenomena encountered in important real-world applications, including pen computing and molecular biology. The basic problem admits a family of variations depending on whether the strings must be matched in their entireties, and whether overlap is permitted. We show that several variants are NPcomplete, and give polynomial-time algorithms for solving the remainder.

international conference on document analysis and recognition | 1997

Extracting text from WWW images

Jiangying Zhou; Daniel P. Lopresti

The authors examine the problem of locating and extracting text from images on the World Wide Web. They describe a text detection algorithm which is based on color clustering and connected component analysis. The algorithm first quantizes the color space of the input image into a number of color classes using a parameter-free clustering procedure. It then identifies text-like connected components in each color class based on their shapes. Finally, a post-processing procedure aligns text-like components into text lines. Experimental results suggest this approach is promising despite the challenging nature of the input data.

International Journal on Document Analysis and Recognition | 2002

Evaluating the performance of table processing algorithms

Jianying Hu; Ramanujan S. Kashi; Daniel P. Lopresti; Gordon T. Wilfong

Abstract. While techniques for evaluating the performance of lower-level document analysis tasks such as optical character recognition have gained acceptance in the literature, attempts to formalize the problem for higher-level algorithms, while receiving a fair amount of attention in terms of theory, have generally been less successful in practice, perhaps owing to their complexity. In this paper, we introduce intuitive, easy-to-implement evaluation schemes for the related problems of table detection and table structure recognition. We also present the results of several small experiments, demonstrating how well the methodologies work and the useful sorts of feedback they provide. We first consider the table detection problem. Here algorithms can yield various classes of errors, including non-table regions improperly labeled as tables (insertion errors), tables missed completely (deletion errors), larger tables broken into a number of smaller ones (splitting errors), and groups of smaller tables combined to form larger ones (merging errors). This leads naturally to the use of an edit distance approach for assessing the results of table detection. Next we address the problem of evaluating table structure recognition. Our model is based on a directed acyclic attribute graph, or table DAG. We describe a new paradigm, “graph probing,” for comparing the results returned by the recognition system and the representation created during ground-truthing. Probing is in fact a general concept that could be applied to other document recognition tasks as well.

Information Retrieval | 2000

Locating and Recognizing Text in WWW Images

Daniel P. Lopresti; Jiangying Zhou

The explosive growth of the World Wide Web has resulted in a distributed database consisting of hundreds of millions of documents. While existing search engines index a page based on the text that is readily extracted from its HTML encoding, an increasing amount of the information on the Web is embedded in images. This situation presents a new and exciting challenge for the fields of document analysis and information retrieval, as WWW image text is typically rendered in color and at very low spatial resolutions. In this paper, we survey the results of several years of our work in the area. For the problem of locating text in Web images, we describe a procedure based on clustering in color space followed by a connected-components analysis that seems promising. For character recognition, we discuss techniques using polynomial surface fitting and “fuzzy” n-tuple classifiers. Also presented are the results of several experiments that demonstrate where our methods perform well and where more work needs to be done. We conclude with a discussion of topics for further research.

Archive | 2002

Document Analysis Systems V

Daniel P. Lopresti; Jianying Hu; Ramanujan S. Kashi

Document images are degraded through bilevel processes such as scanning, printing, and photocopying. The resulting image degradations can be categorized based either on observable degradation features or on degradation model parameters. The degradation features can be related mathematically to model parameters. In this paper we statistically compare pairs of populations of degraded character images created with different model parameters. The changes in the probability that the characters are from different populations when the model parameters vary correlate with the relationship between observable degradation features and the model parameters. The paper also shows which features have the largest impact on the image.

international conference on document analysis and recognition | 1993

Certifiable optical character recognition

Daniel P. Lopresti; Jonathan S. Sandberg

A general-purpose approach for enhancing the accuracy of optical character recognition is described. By taking the view that the printed page is a data transmission channel, the authors raise the possibility of error detecting/correcting codes designed specifically for the OCR process. They present experimental results that demonstrate the feasibility of fully automated, 100% accurate OCR for computer typeset documents.<<ETX>>

document recognition and retrieval | 1999

Medium-independent table detection

Jianying Hu; Ramanujan S. Kashi; Daniel P. Lopresti; Gordon T. Wilfong

An important step towards the goal of table understanding is a method for reliable table detection. This paper describes a general solution for detecting tables based on computing an optimal partitioning of a document into some number of tables. A dynamic programming algorithm is given to solve the resulting optimization problem. This high-level framework is independent of any particular table quality measure and independent of the document medium. Moreover, it does not rely on the presence of ruling lines or other table delimiters. We also present table quality measures based on white space correlation and vertical connected component analysis. These measures can be applied equally well to ASCII text and scanned images. We report on some preliminary experiments using this method to detect tables in both ASCII text and scanned images, yielding promising results. We present detailed evaluation of these results using three different criteria which by themselves pose interesting research questions.

Explore More