Henry S. Baird | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Henry S. Baird is active.

Explore More

Publication

Featured researches published by Henry S. Baird.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 1987

On the Recognition of Printed Characters of Any Font and Size

Simon Kahan; Theo Pavlidis; Henry S. Baird

We describe the current state of a system that recognizes printed text of various fonts and sizes for the Roman alphabet. The system combines several techniques in order to improve the overall recognition rate. Thinning and shape extraction are performed directly on a graph of the run-length encoding of a binary image. The resulting strokes and other shapes are mapped, using a shape-clustering approach, into binary features which are then fed into a statistical Bayesian classifier. Large-scale trials have shown better than 97 percent top choice correct performance on mixtures of six dissimilar fonts, and over 99 percent on most single fonts, over a range of point sizes. Certain remaining confusion classes are disambiguated through contour analysis, and characters suspected of being merged are broken and reclassified. Finally, layout and linguistic context are applied. The results are illustrated by sample pages.

Document image analysis | 1995

Document image defect models

Henry S. Baird

A lack of explicit quantitative models of imaging defects due to printing, optics, and digitization has retarded progress in some areas of document image analysis, including syntactic and structural approaches. Establishing the essential properties of such models, such as completeness (expressive power) and calibration (closeness of fit to actual image populations) remain open research problems. Work-in-progress towards a parameterized model of local imaging defects is described, together with a variety of motivating theoretical arguments and empirical evidence. A pseudo-random image generator implementing the model has been built. Applications of the generator are described, including a polyfont classifier for ASCII and a single-font classifier for a large alphabet (Tibetan U-Chen), both of which which were constructed with a minimum of manual effort. Image defect models and their associated generators permit a new kind of image database which is explicitly parameterized and indefinitely extensible, alleviating some drawbacks of existing databases.

document analysis systems | 2007

The State of the Art of Document Image Degradation Modelling

Henry S. Baird

The literature on models of document image degradation is reviewed, and open problems are listed. In response to the unpleasant fact that the accuracy of document recognition algorithms falls drastically when image quality degrades even slightly, researchers in the last decade have intensiied their study of explicit, quantitative, parameter-ized models of image defects that occur during printing and scanning. Several models have been proposed, some motivated by the physics of image formation and others by the surface statistics of image distributions. A wide range of techniques for estimating parameters of these models has been explored. These models, in the form of pseudo-random generators of synthetic images, permit, for the rst time, investigations into fundamental properties of concrete image recognition problems including the Bayes error of problems and the asymptotic accuracy and domain of competency of classiier technologies. The use of massive sets of synthetic images, in the construction and testing of high-performance classiiers, has accelerated in the last few years. Open problems include the search for methods for comparing competing models and sound methodologies for the use of synthetic data in engineering.

Archive | 1992

Structured Document Image Analysis

Henry S. Baird; Horst Bunke; Kazuhiko Yamamoto

Document image analysis is the automatic computer interpretation of images of printed and handwritten documents, including text, drawings, maps, music scores, etc. Research in this field supports a rapidly growing international industry. This is the first book to offer a broad selection of state-of-the-art research papers, including authoritative critical surveys of the literature, and parallel studies of the architectureof complete high-performance printed-document reading systems. A unique feature is the extended section on music notation, an ideal vehicle for international sharing of basic research. Also, the collection includes important new work on line drawings, handwriting, character and symbol recognition, and basic methodological issues. The IAPR 1990 Workshop on Syntactic and Structural Pattern Recognition is summarized,including the reports of its expert working groups, whose debates provide a fascinating perspective on the field. The book is an excellent text for a first-year graduate seminar in document image analysis,and is likely to remain a standard reference in the field for years.

international conference on document analysis and recognition | 1993

Document image defect models and their uses

Henry S. Baird

The accuracy of todays document recognition algorithms falls abruptly when image quality degrades even slightly. In an effort to surmount this barrier, researchers have in recent years intensified their study of explicit, quantitative, parameterized models of the image defects that occur during printing and scanning. The author reviews the recent literature and discusses the form these models might take. A preview of a large public-domain database of character images, labeled with ground-truth including all defect model parameters, is given. The use of massive pseudo-randomly generated training sets for the construction of high-performance decision trees for preclassification is described. In a more theoretical vein, the author reports preliminary results in the estimation of the intrinsic error of precisely-specified text recognition problems. Finally, the author calls attention to some open problems.<<ETX>>

international conference on pattern recognition | 1990

Image segmentation by shape-directed covers

Henry S. Baird; Susan Elizabeth Jones; Steven Fortune

A technique for image segmentation using shape-directed covers is described and applied to the fully automatic analysis of complex printed-page layouts. The structure of the background (white space) is analyzed, assisted by an enumeration of all maximal white rectangles. For this enumeration, the most computationally expensive step, an algorithm has been developed that, aside from a sort, achieves an expected runtime linear in the number of black connected components. The crucial engineering decision is the specification of a partial order on white rectangles to express domain-specific knowledge of preferred shapes and sizes. This order determines a sequence of partial covers of the background, and thus, a sequence of nested page segmentations. In experimental trials on Manhattan layouts, good segmentations often occur early in this sequence, using a simple and uniform shape-direction rule. This is a global-to-local strategy, which for some tasks is superior to strategies currently emphasized in the literature, including bottom-up and top-down.<<ETX>>

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2000

A statistical, nonparametric methodology for document degradation model validation

Tapas Kanungo; Robert M. Haralick; Henry S. Baird; Werner Stuezle; David Madigan

Printing, photocopying, and scanning processes degrade the image quality of a document. Statistical models of these degradation processes are crucial for document image understanding research. In this paper, we present a statistical methodology that can be used to validate local degradation models. This method is based on a nonparametric, two-sample permutation test. Another standard statistical device, the power function, is then used to choose between algorithm variables such as distance functions. Since the validation and the power function procedures are independent of the model, they can be used to validate any other degradation model. A method for comparing any two models is also described. It uses p-values associated with the estimated models to select the model that is closer to the real world.

Archive | 1992

A Critical Survey of Music Image Analysis

Dorothea Blostein; Henry S. Baird

The research literature concerning the automatic analysis of images of printed and handwritten music notation, for the period 1966 through 1990, is surveyed and critically examined.

international conference on pattern recognition | 1990

Handwritten zip code recognition with multilayer networks

Y. Le Cun; Ofer Matan; Bernhard E. Boser; John S. Denker; D. Henderson; R. E. Howard; W. Hubbard; L.D. Jacket; Henry S. Baird

An application of back-propagation networks to handwritten zip code recognition is presented. Minimal preprocessing of the data is required, but the architecture of the network is highly constrained and specifically designed for the task. The input of the network consists of size-normalized images of isolated digits. The performance on zip code digits provided by the US Postal Service is 92% recognition, 1% substitution, and 7% rejects. Structured neural networks can be viewed as statistical methods with structure which bridge the gap between purely statistical and purely structural methods.<<ETX>>

International Journal of Pattern Recognition and Artificial Intelligence | 1994

BACKGROUND STRUCTURE IN DOCUMENT IMAGES

Henry S. Baird

A method for analyzing the structure of the white background in document images is described, along with applications to the problem of isolating blocks of machine-printed text. The approach is based on computational-geometry algorithms for off-line enumeration of maximal white rectangles and on-line rectangle unification. These support a fast, simple, and general heuristic for geometric layout segmentation, in which white space is covered greedily by rectangles until all text blocks are isolated. Design of the heuristic can be substantially automated by an analysis of the empirical statistical distribution of properties of covering rectangles: for example, the stopping rule can be chosen by Rosenblatt’s perceptron training algorithm. Experimental trials show good behavior on the large and useful class of textual Manhattan layouts. On complex layouts from English-language technical journals of many publishers, the method finds good segmentations in a uniform and nearly parameter-free manner. On a variety of non-Latin texts, some with vertical text lines, the method finds good segmentations without prior knowledge of page and text-line orientation.

Explore More