Ihsin T. Phillips
University of Washington
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ihsin T. Phillips.
international conference on document analysis and recognition | 1993
Tapas Kanungo; Robert M. Haralick; Ihsin T. Phillips
Two sources of document degradation are modeled: i) perspective distortion that occurs while photocopying or scanning thick, bound documents, and ii) degradation due to perturbations in the optical scanning and digitization process: speckle, blurr, jitter, thresholding. Perspective distortion is modeled by studying the underlying perspective geometry of the optical system of photocopiers and scanners. An illumination model is described to account for the nonlinear intensity change occuring across a page in a perspective-distorted document. The optical distortion process is modeled morphologically. First, a distance transform on the foreground is performed, followed by a random inversion of binary pixels where the probability of flip is a function of the distance of the pixel to the boundary of the foreground. Correlating the flipped pixels is modeled by a morphological closing operation.<<ETX>>
international conference on document analysis and recognition | 1995
Jaekyu Ha; Robert M. Haralick; Ihsin T. Phillips
A top-down page segmentation technique known as the recursive X-Y cut decomposes a document image recursively into a set of rectangular blocks. This paper proposes that the recursive X-Y cut be implemented using bounding boxes of connected components of black pixels instead of using image pixels. The advantage is that great improvement can be achieved in computation. In fact, once bounding boxes of connected components are obtained, the recursive X-Y cut is completed within an order of a second on Sparc-10 workstations for letter-sized document images scanned at 900 dpi resolution.
Pattern Recognition | 2006
Yalin Wang; Ihsin T. Phillips; Robert M. Haralick
This paper describes an algorithm for the determination of zone content type of a given zone within a document image. We take a statistical based approach and represent each zone with 25 dimensional feature vectors. An optimized decision tree classifier is used to classify each zone into one of nine zone content classes. A performance evaluation protocol is proposed. The training and testing data sets include a total of 24,177 zones from the University of Washington English Document Image database III. The algorithm accuracy is 98.45% with a mean false alarm rate of 0.50%.
workshop on applications of computer vision | 1996
Jisheng Liang; Jaekyu Ha; Robert M. Haralick; Ihsin T. Phillips
The paper presents an efficient technique for document page layout structure extraction and classification by analyzing the spatial configuration of the bounding boxes of different entities on the given image. The algorithm segments an image into a list of homogeneous zones. The classification algorithm labels each zone as test, table, line-drawing, halftone, ruling, or noise. The text lines and words are extracted within text zones and neighboring text lines are merged to form text blocks. The tabular structure is further decomposed into row and column items. Finally, the document layout hierarchy is produced from these extracted entities.
International Journal of Imaging Systems and Technology | 1994
Tapas Kanungo; Robert M. Haralick; Ihsin T. Phillips
Two sources of document degradation are modeled: (1) perspective distortion that occurs while photocopying or scanning thick, bound documents; and (2) degradation due to perturbation in the optical scanning and digitization process: speckle, blurr, jitter, and threshold. Perspective distortion is modeled by studying the underlying perspective geometry of the optical system of photocopiers and scanners. An illumination model is described to account for the nonlinear intensity change occurring across a page in a perspective‐distorted document. The optical distortion process is modeled morphologically. First, a distance transform on the foreground is performed; this is followed by a random inversion of binary pixels in which the probability of flip is a function of the distance of the pixel to the boundary of the foreground. Correlating the flipped pixels is modeled by a morphological closing operation.©1994 John Wiley & Sons Inc
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2001
Jisheng Liang; Ihsin T. Phillips; Robert M. Haralick
In this paper, we give a formal definition of a document image structure representation, and formulate document image structure extraction as a partitioning problem: finding an optimal solution partitioning the set of glyphs of an input document image into a hierarchical tree structure where entities within the hierarchy at each level have similar physical properties and compatible semantic labels. We present a unified methodology that is applicable to construction of document structures at different hierarchical levels. An iterative, relaxation-like method is used to find a partitioning solution that maximizes the probability of the extracted structure. All the probabilities used in the partitioning process are estimated from an extensive training set of various kinds of measurements among the entities within the hierarchy. The offline probabilities estimated in the training then drive all decisions in the online document structure extraction. We have implemented a text line extraction algorithm using this framework.
Pattern Recognition | 2004
Yalin Wang; Ihsin T. Phillips; Robert M. Haralick
This paper presents a table structure understanding algorithm designed using optimization methods. The algorithm is probability based, where the probabilities are estimated from geometric measurements made on the various entities in a large training set. The methodology includes a global parameter optimization scheme, a novel automatic table ground truth generation system and a table structure understanding performance evaluation protocol. With a document data set having 518 table and 10,934 cell entities, it performed at the 96.76% accuracy rate on the cell level and 98.32% accuracy rate on the table level.
international conference on document analysis and recognition | 1995
Jaekyu Ha; Robert M. Haralick; Ihsin T. Phillips
This paper describes a method for extracting words, textlines and text blocks by analyzing the spatial configuration of bounding boxes of connected component on a given document image. The basic idea is that connected components of black pixels can be used as computational units in document image analysis. In this paper, the problem of extracting words, textlines and text blocks is viewed as a clustering problem in the 2-dimensional discrete domain. Our main strategy is that profiling analysis is utilized to measure horizontal or vertical gaps of (groups of) components during the process of image segmentation. For this purpose, we compute the smallest rectangular box, called the bounding box, which circumscribes a connected component. Those boxes are projected horizontally and/or vertically, and local and global projection profiles are analyzed for word, textline and text-block segmentation. In the last step of segmentation, the document decomposition hierarchy is produced from these segmented objects.
Computer Vision and Image Understanding | 2001
Jisheng Liang; Ihsin T. Phillips; Robert M. Haralick
This paper presents a performance metric for the document structure extraction algorithms by finding the correspondences between detected entities and ground truth. We describe a method for determining an algorithms optimal tuning parameters. We evaluate a group of document layout analysis algorithms on 1600 images from the UW-III Document Image Database, and the quantitative performance measures in terms of the rates of correct, miss, false, merging, splitting, and spurious detections are reported.
international conference on document analysis and recognition | 1995
Ramaswamy Sivaramakrishnan; Ihsin T. Phillips; Jaekyu Ha; Suresh Subramanium; Robert M. Haralick
A document can be divided into zones on the basis of its content. For example, a zone can be either text or non-text. This paper describes an algorithm to classify each given document zone into one of nine different classes. Features for each zone such as run length mean and variance, spatial mean and variance, fraction of the total number of black pixels in the zone, and the zone width ratio for each zone are extracted. Run length related features are computed along four different canonical directions. A decision tree classifier is used to assign a zone class on the basis of its feature vector. The performance on an independent test set was 97%.