Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kris Popat is active.

Publication


Featured researches published by Kris Popat.


european conference on information retrieval | 2002

A Hierarchical Model for Clustering and Categorising Documents

Eric Gaussier; Cyril Goutte; Kris Popat; Francine Chen

We propose a new hierarchical generative model for textual data, where words may be generated by topic specific distributions at any level in the hierarchy. This model is naturally well-suited to clustering documents in preset or automatically generated hierarchies, as well as categorising new documents in an existing hierarchy. Training algorithms are derived for both cases, and illustrated on real data by clustering news stories and categorising newsgroup messages. Finally, the generative model may be used to derive a Fisher kernel expressing similarity between documents.


conference on information and knowledge management | 2001

Text classification in a hierarchical mixture model for small training sets

Kristina Toutanova; Francine Chen; Kris Popat; Thomas Hofmann

Documents are commonly categorized into hierarchies of topics, such as the ones maintained by Yahoo! and the Open Directory project, in order to facilitate browsing and other interactive forms of information retrieval. In addition, topic hierarchies can be utilized to overcome the sparseness problem in text categorization with a large number of categories, which is the main focus of this paper. This paper presents a hierarchical mixture model which extends the standard naive Bayes classifier and previous hierarchical approaches. Improved estimates of the term distributions are made by differentiation of words in the hierarchy according to their level of generality/specificity. Experiments on the Newsgroups and the Reuters-21578 dataset indicate improved performance of the proposed classifier in comparison to other state-of-the-art methods on datasets with a small number of positive examples.


acm/ieee joint conference on digital libraries | 2004

A document corpus browser for in-depth reading

Eric A. Bier; Lance E. Good; Kris Popat; Alan Newberger

Software tools, including Web browsers, e-books, electronic document formats, search engines, and digital libraries are changing the way people read, making it easier for them to find and view documents. However, while these tools provide significant help with short-term reading projects involving small numbers of documents, they provide less help with longer-term reading projects, in which a topic is to be understood in depth by reading many documents. For such projects, readers must Find and manage many documents and citations, remember what has been read, and prioritize what to read next. We describe three integrated software tools that facilitate in-depth reading. A first tool extracts citation information from documents. A second finds on-line documents from their citations. The last is a document corpus browser that uses a zoomable user interface to show a corpus at multiple granularities while supporting reading tasks that take days, weeks, or longer. We describe these tools and the design principles that motivated them.


international conference on acoustics, speech, and signal processing | 2001

Decoding of text lines in grayscale document images

Kris Popat

The Document Image Decoding (DID) framework for recognizing printed text in images has been shown in previous work to achieve extremely high recognition accuracy when its models are well matched to the data. To date, DID has been restricted to binary images, in part for computational reasons, and in part because binary scanning is widely available and often of sufficient spatial resolution to make the use of grayscale information unnecessary for reliable recognition. Advances in computer speed and memory, along with the emergence of low-cost digital still cameras and similar devices as alternatives to traditional scanners, motivates the extension of the DID formalism to the low-spatial-resolution grayscale and color domains. To do so requires substantially generalizing DIDs image-formation and degradation models. This paper lays out an approach and presents preliminary results on real data.


acm/ieee joint conference on digital libraries | 2004

Zoomable user interface for in-depth reading

Eric A. Bier; Kris Popat; Lance E. Good; Alan Newberger

The Instant Bookplex system includes a zoomable user interface (ZUI) for navigating through a spatial representation of a document collection. This ZUI supports extended reading in the collection using semantic zooming, graphical presentation of metadata, animated transitions, and an integrated reading tool. It helps users find and re-find documents, choose good documents to read next, and navigate between documents.


international symposium on memory management | 2002

Two-Stage Lossy/Lossless Compression of Grayscale Document Images

Kris Popat; Dan S. Bloomberg

This paper describes a two-stage method of document image compression wherein a grayscale document image is first processed to improve its compressibility, then losslessly compressed. The initial processing involves hierarchical, coarse-to-fine morphological operations designed to combat the noiselike variability of the low-order bits while attempting to preserve or even improve intelligibility. The result of this stage is losslessly compressed by an arithmetic coder that uses a mixture model to derive context-conditional graylevel probabilities. The lossless stage is compared experimentally with several reference methods, and is found to be competitive at all rates. The overall system is found to be comparable with JPEG in terms of mean-square error performance, but appears to outperform JPEG in terms of subjectively judged document image intelligibility.


international conference on pattern recognition | 2002

Adaptive stack algorithm in document image decoding

Kris Popat; Daniel H. Greene; Tze-Lei Poo

The stack algorithm, which is a best-first search algorithm widely used in speech recognition, is modified for application to the problem of recognizing machine printed text in the document image decoding (DID) framework. An iterative scheme is described wherein successively more stringent stack searches are performed, each time using a model of the image that is updated on the basis of what was discovered on the previous iteration. In this way, the algorithm can adapt to realistic degradation patterns that are irregular and perhaps not well described by stationary models. The contribution of this work is twofold: (1) it represents a reliable method of estimating suitable parameter values for stack decoding in DID, and (2) as a means of handling nonstationary degradation, it presents an alternative to another recently developed approach that is described elsewhere, the iterated complete path algorithm, at potentially lower computational cost. Preliminary results are presented on text line images with simulated nonstationary noise.


document analysis systems | 2002

Human Interactive Proofs and Document Image Analysis

Henry S. Baird; Kris Popat


document engineering | 2003

UpLib: a universal personal digital library system

William C. Janssen; Kris Popat


international conference on pattern recognition | 2002

Paper to PDA

Thomas M. Breuel; William C. Janssen; Kris Popat; Henry S. Baird

Collaboration


Dive into the Kris Popat's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alan Newberger

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge