Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Gerald T. Candela is active.

Publication


Featured researches published by Gerald T. Candela.


Journal of the Association for Information Science and Technology | 1990

Retrieving Records from a Gigabyte of Text on a Minicomputer Using Statistical Ranking.

Donna Harman; Gerald T. Candela

Statistically based ranked retrieval of records using keywords provides many advantages over traditional Boolean retrieval methods, especially for end users. This approach to retrieval, however, has not seen widespread use in large operational retrieval systems. To show the feasibility of this retrieval methodology, research was done to produce very fast search techniques using these ranking algorithms, and then to test the results against large databases with many end users. The results show not only response times on the order of 1 and 1/2 seconds for 806 megabytes of text, but also very favorable user reaction. Novice users were able to consistently obtain good search results after 5 minutes of training. Additional work was done to devise new indexing techniques to create inverted files for large databases using a minicomputer. These techniques use no sorting, require a working space of only about 20% of the size of the input text, and produce indices that are about 14% of the input text size.


Pattern Recognition | 1994

Evaluation of Pattern Classifiers for Fingerprint and OCR Applications

James L. Blue; Gerald T. Candela; Patrick J. Grother; Rama Chellappa; Charles L. Wilson

Abstract The classification accuracy of four statistical and three neural network classifiers for two image based pattern classification problems is evaluated. These are optical character recognition (OCR) for isolated handprinted digits, and fingerprint classification. It is hoped that the evaluation results reported will be useful for designers of practical systems for these two important commercial applications. For the OCR problem, the Karhunen-Loeve (K-L) transform of the images is used to generate the input feature set. Similarly for the fingerprint problem, the K-L transform of the ridge directions is used to generate the input feature set. The statistical classifiers used are Euclidean minimum distance, quadratic minimum distance, normal, and k -nearest neighbor. The neural network classifiers used are multi-layer perceptron, radial basis function, and probabilistic neural network. The OCR data consist of 7480 digit images for training and 23,140 digit images for testing. The fingerprint data used consist of 2000 training and 2000 testing images. In addition to evaluation for accuracy, the multi-layer perceptron and radial basis function networks are evaluated for size and generalization capability. For the evaluated datasets the best accuracy obtained for either problem is provided by a probabilistic neural network. Minimum classification error is 2.5% for OCR and 7.2% for fingerprints.


NIST Interagency/Internal Report (NISTIR) - 5469 | 1994

NIST Form-Based Handprint Recognition System

Michael D. Garris; James L. Blue; Gerald T. Candela; D L. Dommick; Jon C. Geist; Patrick J. Grother; Stanley Janet; Charles L. Wilson

1


Pattern Recognition | 1997

Fast implementations of nearest neighbor classifiers

Patrick J. Grother; Gerald T. Candela; James L. Blue

Abstract Standard implementations of non-parametric classifiers have large computational requirements. Parzen classifiers use the distances of an unknown vector to all N prototype samples, and consequently exhibit O( N ) behavior in both memory and time. We describe four techniques for expediting the nearest neighbor methods: replacing the linear search with a new kd tree method, exhibiting approximately O (N 1 2 ) behavior; employing an L ∞ instead of L 2 distance metric; using variance-ordered features; and rejecting prototypes by evaluating distances in low dimensionality subspaces. We demonstrate that variance-ordered features yield significant efficiency gains over the same features linearly transformed to have uniform variance. We give results for a large OCR problem, but note that the techniques expedite recognition for arbitrary applications. Three of four techniques preserve recognition accuracy.


Information Processing and Management | 1991

Prototyping a distributed information retrieval system that uses statistical ranking

Donna Harman; Wayne McCoy; Robert Toense; Gerald T. Candela

Abstract Centralized systems continue to dominate the information retrieval market, with increased competition from CD-ROM based systems. As more large organizations begin to implement office automation systems, however, many will find that neither of these types of retrieval systems will satisfy their requirements, especially those requirements involving easy integration into other systems and heavy usage by casual end users. A prototype distributed information retrieval system was designed and built using a distributed architecture and using statistical ranking techniques to help provide better service for the end user. The distributed architecture was shown to be a feasible alternative to centralized or CD-ROM information retrieval, and user testing of the ranking methodology showed both widespread user enthusiasm for this retrieval technique and very fast response times (on the order of one second for 300 megabytes of data).


international acm sigir conference on research and development in information retrieval | 1989

A very fast prototype retrieval system using statistical ranking

D. Harmon; Gerald T. Candela

The most striking result in working with Professor Gerald Salton over 20 years ago on the comparison between the SMART system and the MEDLARS system [Salton69] was the fact that whereas Boolean retrieval (MEDLARS) did very well or very poorly, the SMART system always seemed to find some of the relevant records. All of us working at Cornell University during that time wanted to run a full-scale comparison between these systems to demonstrate what was to us the clear superiority of a ranking retrieval system, but this was not possible due to lack of funding and other problems.


ACM Sigchi Bulletin | 1990

Bringing natural language information retrieval out of the closet

Donna Harman; Gerald T. Candela

A prototype information retrieval system was developed that gives users fast and easy access to textual information. This system uses a statistical ranking methodology that allows a user to input a query using only natural language, such as a sentence or a noun phrase, with no special syntax required. The system returns a set of text titles or descriptions, ranked in order of likely relevance to the query. The user can then select one or more titles for further examination of the corresponding text. The prototype was tested by over forty users, all proficient in doing manual research in the subject area, but few proficient in doing online research. The system was very fast, providing response times on the order of one second for searching a gigabyte of data and was also very effective, retrieving at least one relevant record within the first ten records retrieved for 53 out of 68 test queries. All users were able to get satisfactory results within a short time after seeing a demonstration, and those that had never used an online retrieval system did as well as those with experience. This is in sharp contrast to Boolean based retrieval systems where continual use is necessary to obtain consistently good results.


systems man and cybernetics | 1995

Off-line handwriting recognition from forms

Michael D. Garris; James L. Blue; Gerald T. Candela; Darrin L. Dimmick; Jon C. Geist; Patrick J. Grother; Stanley Janet; Charles L. Wilson

A public domain optical character recognition (OCR) system has been developed by the National Institute of Standards and Technology (NIST) to provide a baseline of performance on off-line handwriting recognition from forms. The systems source code, training data, and performance assessment tools are all publicly available. The system recognizes the handprint written on handwriting sample forms as distributed on the CD-ROM, NIST Special Database 19. The public domain package contains a number of significant contributions to OCR technology, including an optimized probabilistic neural network classifier that operates a factor of 20 times faster than traditional software implementations of this algorithm. The modular design of the software makes it useful for training and testing set validation, multiple system voting schemes, and component evaluation and comparison. As an example, the OCR results from two versions of the recognition system are presented and analyzed.


Journal of Electronic Imaging | 1997

Design of a handprint recognition system

Michael D. Garris; James L. Blue; Gerald T. Candela; Darrin L. Dimmick; Jon C. Geist; Patrick J. Grother; Stanley Janet; Omid M. Omidvar; Charles L. Wilson

A public domain optical character recognition (OCR) system has been developed by the National Institute of Standards and Technology (NIST). This standard reference form-based handprint recognition system is designed to provide a baseline of performance on an open application. The systems source code, training data, performance assessment tools, and type of forms processed are all publicly available. The system is modular, allowing for system component testing and comparisons, and it can be used to validate training and testing sets in an end-to-end application. The systems source code is written in C and will run on virtually any UNIX-based computer. The presented functional components of the system are divided into three levels of processing: (1) form-level processing includes the tasks of form registration and form removal; (2) field-level processing includes the tasks of field isolation, line trajectory reconstruction, and field segmentation; and (3) character-level processing includes character normalization, feature extraction, character classification, and dictionary-based postprocessing. The system contains a number of significant contributions to OCR technology, including an optimized probabilistic neural network (PNN) classifier that operates a factor of 20 times faster than traditional software implementations of the algorithm. Provided in the system are a host of data structures and low-level utilities for computing spatial histograms, least-squares fitting, spatial zooming, connected components, Karhunen Loe` ve feature extraction, optimized PNN classification, and dynamic string alignment. Any portion of this standard reference OCR system can be used in commercial products without restrictions.


machine vision applications | 1992

Massively parallel implementation of character recognition systems

Michael D. Garris; Charles L. Wilson; James L. Blue; Gerald T. Candela; Patrick J. Grother; Stanley Janet; R. A. Wilkinson

A massively parallel character recognition system has been implemented. The system is designed to study the feasibility of the recognition of handprinted text in a loosely constrained environment. The NIST handprint database, NIST Special Database 1, is used to provide test data for the recognition system. The system consists of eight functional components. The loading of the image into the system and storing the recognition results from the system are I/O components. In between are components responsible for image processing and recognition. The first image processing component is responsible for image correction for scale and rotation, data field isolation, and character data location within each field; the second performs character segmentation; and the third does character normalization. Three recognition components are responsible for feature extraction and character reconstruction, neural network-based character recognition, and low-confidence classification rejection. The image processing to load and isolate 34 fields on a scientific workstation takes 900 seconds. The same processing takes only 11 seconds using a massively parallel array processor. The image processing components, including the time to load the image data, use 94 of the system time. The segmentation time is 15 ms/character and segmentation accuracy is 89 for handprinted digits and alphas. Character recognition accuracy for medium quality machine print is 99.8. On handprinted digits, the recognition accuracy is 96 and recognition speeds of 10,100 characters/second can be realized. The limiting factor in the recognition portion of the system is feature extraction, which occurs at 806 characters/second. Through the use of a massively parallel machine and neural recognition algorithms, significant improvements in both accuracy and speed have been achieved, making this technology effective as a replacement for key data entry in existing data capture systems.

Collaboration


Dive into the Gerald T. Candela's collaboration.

Top Co-Authors

Avatar

Patrick J. Grother

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar

Charles L. Wilson

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar

James L. Blue

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar

Stanley Janet

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar

Donna Harman

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar

Michael D. Garris

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar

Craig I. Watson

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar

Jon C. Geist

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar

R. A. Wilkinson

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar

Darrin L. Dimmick

National Institute of Standards and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge