Junichi Kanai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Junichi Kanai is active.

Explore More

Publication

Featured researches published by Junichi Kanai.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 1995

Automated evaluation of OCR zoning

Junichi Kanai; Stephen V. Rice; Thomas A. Nartker; George Nagy

Many current optical character recognition (OCR) systems attempt to decompose printed pages into a set of zones, each containing a single column of text, before converting the characters into coded form. The authors present a methodology for automatically assessing the accuracy of such decompositions, and demonstrate its use in evaluating six OCR systems. >

international conference on document analysis and recognition | 1997

Projection profile based skew estimation algorithm for JBIG compressed images

Andrew D. Bagdanov; Junichi Kanai

Abstract. A new projection profile based skew estimation algorithm is presented. It extracts fiducial points corresponding to objects on a page by decoding a JBIG compressed image. These points are projected along parallel lines into an accumulator array. The angle of projection within a search interval that maximizes alignment of the fiducial points is the skew angle. This algorithm and three other algorithms were tested. Results showed that the new algorithm performed comparably to the other algorithms. The JBIG progressive coding scheme reduces the effects of noise and graphics, and the accuracy of the new algorithm on 75 dpi unfiltered images and 300 dpi filtered images was similar.

international conference on document analysis and recognition | 1993

Performance metrics for document understanding systems

Junichi Kanai; Thomas A. Nartker; Stephen V. Rice; George Nagy

Requirements for the objective evaluation of automated data-entry systems are presented. Because the cost of correcting errors dominates the document conversion process, the most important characteristic of an OCR device is accuracy. However, different measures of accuracy (error metrics) are appropriate for different applications, and at the character, word, text-line, text-block, and document levels. For wholly objective assessment, OCR devices must be tested under programmed, rather than interactive, control.<<ETX>>

international conference on document analysis and recognition | 1995

Adaptive image restoration of text images that contain touching or broken characters

Peter Stubberud; Junichi Kanai; Venugopal Kalluri

To improve the performance of an optical character recognition (OCR) system, an adaptive technique that restores touching or broken character images is proposed. By using the output from an OCR system and a distorted text image, this technique trains an adaptive restoration filter and then applies the filter to the distorted text image that the OCR system could not recognize. To demonstrate the performance of this technique, two synthesized images containing only touching characters and two synthesized images containing only broken characters were processed. The results show that this technique can improve both pixel and character accuracy of text images containing touching or broken characters.

Electronic Imaging: Science and Technology | 1996

Evaluation of document image skew estimation techniques

Andrew D. Bagdanov; Junichi Kanai

Recently there has been an increased interest in document image skew detection algorithms. Most of the papers relevant to this problem include some experimental results. However, there exists a lack of a universally accepted methodology for evaluating the performance of such algorithms. We have implemented four types of skew detection algorithms in order to investigate possible testing methodologies. We then tested each algorithm on a sample of 460 page images randomly selected from a collection of approximately 100,000 pages. This collection contains a wide variety of typographical features and styles. In our evaluation we examine several issues relevant to the establishment of a uniform testing methodology. First, we begin with a clear definition of the problem and the ground truth collection process. Then we examine the need for pre-processing and parameter optimization specific to each technique. Next, we investigate the problem of establishing meaningful statistical measurements of the performance of these algorithms and the use of non-parametric comparison methods to perform pairwise comparisons of methods. Lastly, we look at the sensitivity of each algorithm to particular typographical features, which indicates the need for the adoption of a stratified sampling paradigm for accurate analysis of performance.

International Journal of Pattern Recognition and Artificial Intelligence | 1994

An algorithm for matching OCR-generated text strings

Stephen V. Rice; Junichi Kanai; Thomas A. Nartker

When optical character recognition (OCR) devices process the same page image, they generate similar text strings. Differences are due to recognition errors. A page of text rarely contains long repeated substrings; therefore, N strings generated by OCR devices can be quickly matched by detecting long common substrings. An algorithm for matching an arbitrary number of strings based on this principle is presented. Although its worst-case performance is O(Nn2), its performance in practice has been observed to be O(Nn log n), where n is the length of a string. This algorithm has been successfully used to study OCR errors, to determine the accuracy of OCR devices, and to implement a voting algorithm.

International Journal of Imaging Systems and Technology | 1996

Automated performance evaluation of document image analysis systems: Issues and practice

Junichi Kanai

The performance of document image analysis systems is affected by a variety of variables that alter the quality of documents. Objective evaluation and characterization of systems usually require large quantities of test data, and it is important to automate evaluation processes. In this article, issues in designing tools for automated evaluation of document image analysis techniques and systems are discussed, and some examples are presented.

visual information processing conference | 1998

Edge enhancement of remote sensing image data in the DCT domain

Biao Chen; Shahram Latifi; Junichi Kanai

Edge Enhancement is an important image processing method for remote sensing image data. Many images are compressed by JPEG standard which uses the Discrete Cosine Transform (DCT). Manipulating data in the DCT domain is an efficient way to save the computer resources. In this paper, new algorithms for edge enhancement of remote sensing image data in the DCT domain are developed and implemented in three steps: highpass filtering, adding back full or part of gray levels to the original image, and linear contrast stretching. In addition, a method to approximate the minimum (MIN) and maximum (MAX) gray level intensity is presented in the paper, which are the necessary information for contrast stretching. Experimental results show that the quality of images generated by the new algorithms are comparable with that of images generated by the corresponding methods in the spatial domain.

IS&T/SPIE's Symposium on Electronic Imaging: Science & Technology | 1995

Preliminary evaluation of histogram-based binarization algorithms

Junichi Kanai; Kevin O. Grover

To date, most optical character recognition (OCR) systems process binary document images, and the quality of the input image strongly affects their performance. Since a binarization process is inherently lossy, different algorithms typically produce different binary images from the same gray scale image. The objective of this research is to study effects of global binarization algorithms on the performance of OCR systems. Several binarization methods were examined: the best fixed threshold value for the data set, the ideal histogram method, and Otsus algorithm. Four contemporary OCR systems and 50 hard copy pages containing 91,649 characters were used in the experiments. These pages were digitized at 300 dpi and 8 bits/pixel, and 36 different threshold values (ranging from 59 to 199 in increments of 4) were used. The resulting 1,800 binary images were processed by all four OCR systems. All systems made approximately 40% more errors from images generated by Otsus method than those of the ideal histogram method. Two of the systems made approximately the same number of errors from images generated by the best fixed threshold value and Otsus method.

document analysis systems | 1998

Document Image Analysis Using a New Compression Algorithm

Shulan Deng; Shahram Latifi; Junichi Kanai

By proper exploitation of the structural characteristics existing in a compressed document, it is possible to speed up certain image processing operations. Alternatively, one can derive a compression scheme which would lend itself to an efficient manipulation of documents without compromising the compression factor. Here, a run-based compression technique is discussed for binary documents. The technique, in addition to achieving bit rates comparable to other compression schemes, preserves document features which are useful for analysis and manipulation of data. Algorithms are proposed to perform vertical run extraction, and similar operations in the compressed domain. These algorithms are implemented in software. Experimental results indicate that fast analysis of electronic data is possible if data is coded according to the proposed scheme.

Explore More