Dan S. Bloomberg | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dan S. Bloomberg is active.

Explore More

Publication

Featured researches published by Dan S. Bloomberg.

international conference on acoustics, speech, and signal processing | 1993

Word spotting in scanned images using hidden Markov models

Francine Chen; Lynn Wilcox; Dan S. Bloomberg

A hidden-Markov-model (HMM)-based system for font-independent spotting of user-specified keywords in a scanned image is described. Word bounding boxes of potential keywords are extracted from the image using a morphology-based preprocessor. Feature vectors based on the external shape and internal structure of the word are computed over vertical columns of pixels in a word bounding box. For each user-specified keyword, an HMM is created by concatenating appropriate context-dependent character HMMs. Nonkeywords are modeled using an HMM based on context-dependent subcharacter models. Keyword spotting is performed using a Viterbi search through the HMM network created by connecting the keyword and nonkeyword HMMs in parallel. Applications of word-image spotting include information filtering in images from facsimile and copy machines, and information retrieval from text image databases.<<ETX>>

international conference on document analysis and recognition | 1993

Detecting and locating partially specified keywords in scanned images using hidden Markov models

Francine Chen; Lynn Wilcox; Dan S. Bloomberg

A hidden Markov model (HMM) based system for detecting locating, or spotting, user-specified keywords in scanned images is described. The system is font-independent, and no pre-segmentation of text and graphics is required. The bounding boxes of potential lines of text are extracted from the image using morphology. Feature vectors based on the external shape and internal structure of characters are computed for each bounding box. A keyword HMM is created by concatenating appropriate context-dependent character HMMs. The non-keyword HMM is based on context-dependent sub-character models. Keywords are spotted using Viterbi decoding on an HMM network created from the keyword and non-keyword HMMs. This model allows detection of keywords embedded in a line without pre-segmentation of the line into words or characters. Thus keywords may be specified by a baseform and variants of the keyword can be detected.<<ETX>>

Computer Vision and Image Understanding | 1998

Summarization of Imaged Documents without OCR

Francine Chen; Dan S. Bloomberg

A system is presented for creating a summary indicating the contents of an imaged document. The summary is composed from selected regions extracted from the imaged document. The regions may include sentences, key phrases, headings, and figures. The extracts are identified without the use of optical character recognition. The imaged document is first processed to identify the word-bounding boxes, the reading order of words, and the location of sentence and paragraph boundaries in the text. The word-bounding boxes are grouped into equivalence classes to mimic the terms in a text document. Equivalence classes representing content words are identified, and key phrases are identified from the set of content words. Summary sentences are selected using a statistically based classifier applied to a set of discrete sentence features. Evaluation of sentence selection against a set of abstracts created by a professional abstracting company is given.

Journal of Applied Physics | 1984

Nonlinear boundary element method for two‐dimensional magnetostatics

Meng H. Lean; Dan S. Bloomberg

An interface formulation, successfully used for the solution of magnetostatic fields in piecewise homogeneous media, is extended to nonlinear media problems. The advantage of boundary integral techniques, in the reduction of problem dimensionality, is retained. This nonlinear boundary element method (BEM) algorithm alternates solution of the augmented interface equation with satisfaction of the constitutive media relations. Use of the M‐B chararteristics is shown to guarantee iteration stability because the slope approaches unity as μ tends to infinity. Isoparametric elements of second order are used for both geometry and sources to ensure high solution fidelity. Galerkin’s method is employed for discretization to matrix form. Each iteration requires only the product of the inverted system matrix with the augmented RHS. Examples include saturation of square and circular cylinders of high permeability in a uniform field. It is shown that with an optimal relaxation factor of 0.65, only 3–6 iterations are ...

international conference on document analysis and recognition | 1997

Extraction of indicative summary sentences from imaged documents

Francine Chen; Dan S. Bloomberg

A system for selecting sentences from an imaged document for presentation as part of a document summary is presented. The extracts are identified without the use of optical character recognition. The sentences are selected based on a set of discrete features characterizing the words within a sentence and the location of the sentence within the imaged document. Each sentence is scored based on the values of the discrete features using a statistically based classifier. The imaged document is processed to identify the word locations, the reading order of words, and the location of sentence and paragraph boundaries in the text. The words are grouped into equivalence classes to mimic the terms in a text document. A sample extract for a technical document is shown, and evaluation against a set of abstracts created by a professional abstracting company is given. These results are compared with text-based abstracts.

Journal of Electronic Imaging | 2000

Pattern matching using the blur hit-miss transform

Dan S. Bloomberg; Luc Vincent

The usefulness of the hit-miss transform (HMT) and re- lated transforms for pattern matching in document image applica- tions is examined. Although the HMT is sensitive to the types of noise found in scanned images, including both boundary and ran- dom noise, a simple extension, the blur HMT, is relatively robust. The noise immunity of the blur HMT derives from its ability to treat both types of noise together, and to remove them by appropriate dilations. In analogy with the Hausdorff metric for the distance be- tween two sets, metric generalizations for special cases of the blur HMT are derived. Whereas Hausdorff uses both directions of the directed distances between two sets, a metric derived from a special case of the blur HMT uses just one direction of the directed dis- tances between foreground and background components of two sets. For both foreground and background, the template is always the first of the directed sets. A less-restrictive metric generalization, where the disjoint foreground and background components of the template need not be set complements, is also derived. For images with a random component of noise, the blur HMT is sensitive only to the size of the noise, whereas Hausdorff matching is sensitive to its location. It is also shown how these metric functions can be derived from the distance functions of the foreground (FG) and background (BG) of an image, using dilation by the appropriate templates. The blur HMT can be used as a fast heuristic to avoid more expensive integer-based matching techniques, and it is implemented efficiently with boolean image operations. The FG and BG images are dilated with structuring elements that depend on image noise and pattern variability, and the results are then eroded with templates derived from patterns to be matched. Subsampling the patterns on a regular grid can improve speed and maintain match quality, and examples are given that indicate how to explore the parameter space. Trun- cated matches give the same result as full erosions, are much faster, and for some applications can be performed at a restricted set of locations.

international conference on document analysis and recognition | 1995

A comparison of discrete and continuous hidden Markov models for phrase spotting in text images

Francine Chen; Lynn Wilcox; Dan S. Bloomberg

In spotting for phrases in text images, speed and accuracy are important considerations. In a hidden Markov model (HMM) based spotter recognition time is dominated by the time required to compute the state conditional observation probabilities. These probabilities are a measure of how well the data match each state in the model. In this paper discrete and continuous hidden Markov models are compared based on speed and accuracy in spotting for phrases in text images. For the discrete HMM, vector quantization is used to associate each continuous feature vector with a discrete value. For the continuous HMMs, the observation distributions for the feature vectors are modeled by either a single Gaussian, or a mixture of two Gaussians. Comparisons were made on a subset of the UW English Document Image Database I. The best accuracy was observed when a mixture of two Gaussians was used in the continuous HMM. The discrete HMM provides for faster spotting particularly when long phrases are used.

international conference on image processing | 1996

Document image summarization without OCR

Dan S. Bloomberg; Francine Chen

A system for selecting excerpts directly from imaged text without performing optical character recognition is described. The images are segmented to find text regions, text lines and words, and sentence and paragraph boundaries are identified. A set of word equivalence classes is computed based on the rank blur hit-miss transform. This information is used to identify stop words and keywords. Sentences for presentation as part of a summary are then selected based on keywords and on the location of the sentences.

IEEE Transactions on Magnetics | 1983

Readback bit shift with finite pole-length heads on perpendicular media

Dan S. Bloomberg; Meng H. Lean; G. Kelley

Perpendicular writing fields are calculated for finite pole-length gapped heads, with and without a permeable layer beneath the magnetic medium. These head fields are used to compute readback waveforms from ideal perpendicular transitions, which are detected both by zero-crossings and by inflection points of the waveform. Linear readback bit shift, given by the difference between detected and written transitions, is normalized to half the minimum transition spacing. For detection by waveform zero-crossing, bit shifts are unacceptably large with an under layer. Without an underlayer, detection by inflection point is considerably better than by waveform zero-crossing. Surprisingly, bit shifts from inflection points are only marginally better without an underlayer than with one. A recording system with an underlayer may in fact give superior performance because of other contributions to the total bit shift.

international conference on document analysis and recognition | 2001

Document image decoding using Iterated Complete Path search with subsampled heuristic scoring

Dan S. Bloomberg; Thomas P. Minka; Kris Popat

It has been shown that the computation time of document image decoding can be significantly reduced by employing heuristics in the search for the best decoding of a text line. In the Iterated Complete Path (ICP) method, template matches are performed only along the best path found by dynamic programming on each iteration. When the best path stabilizes, the decoding is optimal and no more template matches need to be performed. In this way, only a tiny fraction of potential template matches must be evaluated, and the computation time is typically dominated by the evaluation of the initial heuristic upper bound for each template at each location in the image. The time to compute this bound depends on the resolution at which the matching scores are found. At lower resolution, the heuristic computation is reduced, but because a weaker bound is used, the number of Viterbi iterations is increased. We present the optimal (lowest upper-bound) heuristic for any degree of subsampling of multilevel template and/or interpolation, for use in text line decoding with ICP. The optimal degree of subsampling depends on image quality, but it is typically found that a small amount of template subsampling is effective in reducing the overall decoding time.

Explore More