Jiangying Zhou
Panasonic
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jiangying Zhou.
international conference on document analysis and recognition | 1997
Jiangying Zhou; Daniel P. Lopresti
The authors examine the problem of locating and extracting text from images on the World Wide Web. They describe a text detection algorithm which is based on color clustering and connected component analysis. The algorithm first quantizes the color space of the input image into a number of color classes using a parameter-free clustering procedure. It then identifies text-like connected components in each color class based on their shapes. Finally, a post-processing procedure aligns text-like components into text lines. Experimental results suggest this approach is promising despite the challenging nature of the input data.
Information Retrieval | 2000
Daniel P. Lopresti; Jiangying Zhou
The explosive growth of the World Wide Web has resulted in a distributed database consisting of hundreds of millions of documents. While existing search engines index a page based on the text that is readily extracted from its HTML encoding, an increasing amount of the information on the Web is embedded in images. This situation presents a new and exciting challenge for the fields of document analysis and information retrieval, as WWW image text is typically rendered in color and at very low spatial resolutions. In this paper, we survey the results of several years of our work in the area. For the problem of locating text in Web images, we describe a procedure based on clustering in color space followed by a connected-components analysis that seems promising. For character recognition, we discuss techniques using polynomial surface fitting and “fuzzy” n-tuple classifiers. Also presented are the results of several experiments that demonstrate where our methods perform well and where more work needs to be done. We conclude with a discussion of topics for further research.
Computer Vision and Image Understanding | 1997
Daniel P. Lopresti; Jiangying Zhou
Abstract We present experimental results suggesting that between 20 and 50% of the errors caused by a single OCR package can be eliminated by simply scanning a page three times and running a “consensus sequence” voting procedure. This technique, which originates from molecular biology, takes exponential time in general, but can be specialized to a fast heuristic guaranteed to be optimal for the cases of interest. The improvement in recognition accuracy is achieved without making a priori assumptions about the distribution of OCR errors (i.e., no “training” is required).
IEEE Transactions on Pattern Analysis and Machine Intelligence | 1998
Prateek Sarkar; George Nagy; Jiangying Zhou; Daniel P. Lopresti
The bitmap obtained by scanning a printed pattern depends on the exact location of the scanning grid relative to the pattern. We consider ideal sampling with a regular lattice of delta functions. The displacement of the lattice relative to the pattern is random and obeys a uniform probability density function defined over a unit cell of the lattice. Random-phase sampling affects the edge-pixels of sampled patterns. The resulting number of distinct bitmaps and their relative frequencies can be predicted from a mapping of the original pattern boundary to the unit cell (called a module-grid diagram). The theory is supported by both simulated and experimental results. The module-grid diagram may be useful in helping to understand the effects of edge-pixel variation on optical character recognition.
Pattern Recognition | 1997
Jiangying Zhou; Daniel P. Lopresti
Abstract In traditional pattern recognition, the classification decision is based on a single observation of the input. In this paper, we show that by relaxing this assumption, the performance of the classifier can be improved substantially. We present a detailed analysis of one particular method for achieving this: taking a consensus vote on the classifiers output for repeated samples of the input. We prove that this approach always yields a net improvement in recognition accuracy for common distributions of interest. Upper and lower bounds on the improvement are also discussed. Under certain conditions, it is even possible to “beat” the Bayes error bound associated with the classifier. We conclude by presenting results from three sets of experiments examining the effectiveness of the idea.
Proceedings of SPIE | 1998
Jiangying Zhou; Daniel P. Lopresti; Tolga Tasdizen
In this paper, we consider the problem of locating and extracting text from WWW images. A previous algorithm based on color clustering and connected components analysis works well as long as the color of each character is relatively uniform and the typography is fairly simple. It breaks down quickly, however, when these assumptions are violated. In this paper, we describe more robust techniques for dealing with this challenging problem. We present an improved color clustering algorithm that measures similarity based on both RGB and spatial proximity. Layout analysis is also incorporated to handle more complex typography. THese changes significantly enhance the performance of our text detection procedure.
Archive | 1997
Jiangying Zhou; Daniel P. Lopresti
Archive | 1998
Daniel P. Lopresti; Jeffrey Esakov; Jiangying Zhou
Archive | 1995
Daniel P. Lopresti; Jeffrey Esakov; Jiangying Zhou
Archive | 2001
Jiangying Zhou; Daniel P. Lopresti; Andrew Tomkins