Is this you? Create Your Porfile

Shaolei Feng

University of Massachusetts Amherst

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shaolei Feng is active.

Explore More

Publication

Featured researches published by Shaolei Feng.

computer vision and pattern recognition | 2004

Multiple Bernoulli relevance models for image and video annotation

Shaolei Feng; R. Manmatha; Victor Lavrenko

Retrieving images in response to textual queries requires some knowledge of the semantics of the picture. Here, we show how we can do both automatic image annotation and retrieval (using one word queries) from images and videos using a multiple Bernoulli relevance model. The model assumes that a training set of images or videos along with keyword annotations is provided. Multiple keywords are provided for an image and the specific correspondence between a keyword and an image is not provided. Each image is partitioned into a set of rectangular regions and a real-valued feature vector is computed over these regions. The relevance model is a joint probability distribution of the word annotations and the image feature vectors and is computed using the training set. The word probabilities are estimated using a multiple Bernoulli model and the image feature probabilities using a non-parametric kernel density estimate. The model is then used to annotate images in a test set. We show experiments on both images from a standard Corel data set and a set of video key frames from NISTs video tree. Comparative experiments show that the model performs better than a model based on estimating word probabilities using the popular multinomial distribution. The results also show that our model significantly outperforms previously reported results on the task of image and video annotation.

international conference on acoustics, speech, and signal processing | 2004

Statistical models for automatic video annotation and retrieval

Victor Lavrenko; Shaolei Feng; R. Manmatha

We apply a continuous relevance model (CRM) to the problem of directly retrieving the visual content of videos using text queries. The model computes a joint probability model for image features and words using a training set of annotated images. The model may then be used to annotate unseen test images. The probabilistic annotations are used for retrieval using text queries. We also propose a modified model - the normalized CRM - which substantially improves performance on a subset of the TREC video dataset.

computer vision and pattern recognition | 2003

Using Corner Feature Correspondences to Rank Word Images by Similarity

Jamie L. Rothfeder; Shaolei Feng; Toni M. Rath

Libraries contain enormous amounts of handwritten historical documents which cannot be made available on-line because they do not have a searchable index. The wordspotting idea has previously been proposed as a solution to creating indexes for such documents and collections by matching word images. In this paper we present an algorithm which compares whole word-images based on their appearance. This algorithm recovers correspondences of points of interest in two images, and then uses these correspondences to construct a similarity measure. This similarity measure can then be used to rank word-images in order of their closeness to a querying image. We achieved an average precision of 62.57% on a set of 2372 images of reasonable quality and an average precision of 15.49% on a set of 3262 images from documents of poor quality that are even hard to read for humans.

acm/ieee joint conference on digital libraries | 2006

A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books

Shaolei Feng; R. Manmatha

A number of projects are creating searchable digital libraries of printed books. These include the Million Book Project, the Google book project and similar efforts from Yahoo and Microsoft. Content-based on line book retrieval usually requires first converting printed text into machine readable (e.g. ASCII) text using an optical character recognition (OCR) engine and then doing full text search on the results. Many of these books are old and there are a variety of processing steps that are required to create an end to end system. Changing any step (including the scanning process) can affect OCR performance and hence a good automatic statistical evaluation of OCR performance on book length material is needed. Evaluating OCR performance on the entire book is non-trivial. The only easily obtainable ground truth (the Gutenberg e-texts) must be automatically aligned with the OCR output over the entire length of a book. This may be viewed as equivalent to the problem of aligning two large (easily a million long) sequences. The problem is further complicated by OCR errors as well as the possibility of large chunks of missing material in one of the sequences. We propose a hidden Markov model (HMM) based hierarchical alignment algorithm to align OCR output and the ground truth for books. We believe this is the first work to automatically align a whole book without using any book structure information. The alignment process works by breaking up the problem of aligning two long sequences into the problem of aligning many smaller subsequences. This can be rapidly and effectively done. Experimental results show that our hierarchical alignment approach works very well even if OCR output has a high recognition error rate. Finally, we evaluate the performance of a commercial OCR engine over a large dataset of books based on the alignment results

acm multimedia | 2005

Joint visual-text modeling for automatic retrieval of multimedia documents

Giridharan Iyengar; Pinar Duygulu; Shaolei Feng; Pavel Ircing; Sanjeev Khudanpur; Dietrich Klakow; M. R. Krause; R. Manmatha; Harriet J. Nock; D. Petkova; Brock Pytlik; Paola Virga

In this paper we describe a novel approach for jointly modeling the text and the visual components of multimedia documents for the purpose of information retrieval(IR). We propose a novel framework where individual components are developed to model different relationships between documents and queries and then combined into a joint retrieval framework. In the state-of-the-art systems, a late combination between two independent systems, one analyzing just the text part of such documents, and the other analyzing the visual part without leveraging any knowledge acquired in the text processing, is the norm. Such systems rarely exceed the performance of any single modality (i.e. text or video) in information retrieval tasks. Our experiments indicate that allowing a rich interaction between the modalities results in significant improvement in performance over any single modality. We demonstrate these results using the TRECVID03 corpus, which comprises 120 hours of broadcast news videos. Our results demonstrate over 14 % improvement in IR performance over the best reported text-only baseline and ranks amongst the best results reported on this corpus.

Second International Conference on Document Image Analysis for Libraries (DIAL'06) | 2006

Exploring the use of conditional random field models and HMMs for historical handwritten document recognition

Shaolei Feng; R. Manmatha; Andrew McCallum

In this paper we explore different approaches for improving the performance of dependency models on discrete features for handwriting recognition. Hidden Markov models have often been used for handwriting recognition. Conditional random fields (CRFs) allow for more general dependencies and we investigate their use. We believe that this is the first attempt at apply CRFs for handwriting recognition. We show that on the whole word recognition task, the CRF performs better than a HMM on a publicly available standard dataset of 20 pages of George Washingtons manuscripts. The scale space for the whole word recognition task is large - almost 1200 states. To make CRF computation tractable we use beam search to make inference more efficient using three different approaches. Better improvement can be obtained using the HMM by directly smoothing the discrete features using the collection frequencies. This shows the importance of smoothing and also indicates the difficulty of training CRFs when large state spaces are involved

computer vision and pattern recognition | 2009

Automatic fetal face detection from ultrasound volumes via learning 3D and 2D information

Shaolei Feng; S. Kevin Zhou; Sara Good; Dorin Comaniciu

3D ultrasound imaging has been increasingly used in clinics for fetal examination. However, manually searching for the optimal view of the fetal face in 3D ultrasound volumes is cumbersome and time-consuming even for expert physicians and sonographers. In this paper we propose a learning-based approach which combines both 3D and 2D information for automatic and fast fetal face detection from 3D ultrasound volumes. Our approach applies a new technique - constrained marginal space learning - for 3D face mesh detection, and combines a boosting-based 2D profile detection to refine 3D face pose. To enhance the rendering of the fetal face, an automatic carving algorithm is proposed to remove all obstructions in front of the face based on the detected face mesh. Experiments are performed on a challenging 3D ultrasound data set containing 1010 fetal volumes. The results show that our system not only achieves excellent detection accuracy but also runs very fast - it can detect the fetal face from the 3D data in 1 second on a dual-core 2.0 GHz computer.

Pattern Recognition | 2009

Finding words in alphabet soup: Inference on freeform character recognition for historical scripts

Nicholas R. Howe; Shaolei Feng; R. Manmatha

This paper develops word recognition methods for historical handwritten cursive and printed documents. It employs a powerful segmentation-free letter detection method based upon joint boosting with histograms of gradients as features. Efficient inference on an ensemble of hidden Markov models can select the most probable sequence of candidate character detections to recognize complete words in ambiguous handwritten text, drawing on character n-gram and physical separation models. Experiments with two corpora of handwritten historic documents show that this approach recognizes known words more accurately than previous efforts, and can also recognize out-of-vocabulary words.

international conference on document analysis and recognition | 2005

Classification models for historical manuscript recognition

Shaolei Feng; R. Manmatha

This paper investigates different machine learning models to solve the historical handwritten manuscript recognition problem. In particular, we test and compare support vector machines, conditional maximum entropy models and Naive Bayes with kernel density estimates and explore their behaviors and properties when solving this problem. We focus on a whole word problem to avoid having to do character segmentation which is difficult with degraded handwritten documents. Our results on a publicly available standard dataset of 20 pages of George Washingtons manuscripts show that Naive Bayes with Gaussian kernel density estimates significantly outperforms the other models and prior work using hidden Markov models on this heavily unbalanced dataset.

conference on image and video retrieval | 2008