Victor Lavrenko | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Victor Lavrenko is active.

Explore More

Publication

Featured researches published by Victor Lavrenko.

international acm sigir conference on research and development in information retrieval | 2001

Relevance-Based Language Models

Victor Lavrenko; W. Bruce Croft

We explore the relation between classical probabilistic models of information retrieval and the emerging language modeling approaches. It has long been recognized that the primary obstacle to effective performance of classical models is the need to estimate a relevance model: probabilities of words in the relevant class. We propose a novel technique for estimating these probabilities using the query alone. We demonstrate that our technique can produce highly accurate relevance models, addressing important notions of synonymy and polysemy. Our experiments show relevance models outperforming baseline language modeling systems on TREC retrieval and TDT tracking tasks. The main contribution of this work is an effective formal method for estimating a relevance model with no training data.

international acm sigir conference on research and development in information retrieval | 2003

Automatic image annotation and retrieval using cross-media relevance models

Jiwoon Jeon; Victor Lavrenko; R. Manmatha

Libraries have traditionally used manual image annotation for indexing and then later retrieving their image collections. However, manual image annotation is an expensive and labor intensive procedure and hence there has been great interest in coming up with automatic ways to retrieve images based on content. Here, we propose an automatic approach to annotating and retrieving images based on a training set of images. We assume that regions in an image can be described using a small vocabulary of blobs. Blobs are generated from image features using clustering. Given a training set of images with annotations, we show that probabilistic models allow us to predict the probability of generating a word given the blobs in an image. This may be used to automatically annotate and retrieve images given a word as a query. We show that relevance models allow us to derive these probabilities in a natural way. Experiments show that the annotation performance of this cross-media relevance model is almost six times as good (in terms of mean precision) than a model based on word-blob co-occurrence model and twice as good as a state of the art model derived from machine translation. Our approach shows the usefulness of using formal information retrieval models for the task of image annotation and retrieval.

computer vision and pattern recognition | 2004

Multiple Bernoulli relevance models for image and video annotation

Shaolei Feng; R. Manmatha; Victor Lavrenko

Retrieving images in response to textual queries requires some knowledge of the semantics of the picture. Here, we show how we can do both automatic image annotation and retrieval (using one word queries) from images and videos using a multiple Bernoulli relevance model. The model assumes that a training set of images or videos along with keyword annotations is provided. Multiple keywords are provided for an image and the specific correspondence between a keyword and an image is not provided. Each image is partitioned into a set of rectangular regions and a real-valued feature vector is computed over these regions. The relevance model is a joint probability distribution of the word annotations and the image feature vectors and is computed using the training set. The word probabilities are estimated using a multiple Bernoulli model and the image feature probabilities using a non-parametric kernel density estimate. The model is then used to annotate images in a test set. We show experiments on both images from a standard Corel data set and a set of video key frames from NISTs video tree. Comparative experiments show that the model performs better than a model based on estimating word probabilities using the popular multinomial distribution. The results also show that our model significantly outperforms previously reported results on the task of image and video annotation.

international acm sigir conference on research and development in information retrieval | 2002

Cross-lingual relevance models

Victor Lavrenko; Martin Choquette; W. Bruce Croft

We propose a formal model of Cross-Language Information Retrieval that does not rely on either query translation or document translation. Our approach leverages recent advances in language modeling to directly estimate an accurate topic model in the target language, starting with a query in the source language. The model integrates popular techniques of disambiguation and query expansion in a unified formal framework. We describe how the topic model can be estimated with either a parallel corpus or a dictionary. We test the framework by constructing Chinese topic models from English queries and using them in the CLIR task of TREC9. The model achieves performance around 95% of the strong mono-lingual baseline in terms of average precision. In initial precision, our model outperforms the mono-lingual baseline by 20%. The main contribution of this work is the unified formal model which integrates techniques that are essential for effective Cross-Language Retrieval.

international acm sigir conference on research and development in information retrieval | 2003

Challenges in information retrieval and language modeling: report of a workshop held at the center for intelligent information retrieval, University of Massachusetts Amherst, September 2002

James Allan; Jay Aslam; Nicholas J. Belkin; Chris Buckley; James P. Callan; W. Bruce Croft; Susan T. Dumais; Norbert Fuhr; Donna Harman; David J. Harper; Djoerd Hiemstra; Thomas Hofmann; Eduard H. Hovy; Wessel Kraaij; John D. Lafferty; Victor Lavrenko; David Lewis; Liz Liddy; R. Manmatha; Andrew McCallum; Jay M. Ponte; John M. Prager; Dragomir R. Radev; Philip Resnik; Stephen E. Robertson; Ron G. Rosenfeld; Salim Roukos; Mark Sanderson; Richard M. Schwartz; Amit Singhal

Information retrieval (IR) research has reached a point where it is appropriate to assess progress and to define a research agenda for the next five to ten years. This report summarizes a discussion of IR research challenges that took place at a recent workshop. The attendees of the workshop considered information retrieval research in a range of areas chosen to give broad coverage of topic areas that engage information retrieval researchers. Those areas are retrieval models, cross-lingual retrieval, Web search, user modeling, filtering, topic detection and tracking, classification, summarization, question answering, metasearch, distributed retrieval, multimedia retrieval, information extraction, as well as testbed requirements for future work. The potential use of language modeling techniques in these areas was also discussed. The workshop identified major challenges within each of those areas. The following are recurring themes that ran throughout: • User and context sensitive retrieval • Multi-lingual and multi-media issues • Better target tasks • Improved objective evaluations • Substantially more labeled data • Greater variety of data sources • Improved formal models Contextual retrieval and global information access were identified as particularly important long-term challenges.

First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings. | 2004

Holistic word recognition for handwritten historical documents

Victor Lavrenko; Toni M. Rath; R. Manmatha

Most offline handwriting recognition approaches proceed by segmenting words into smaller pieces (usually characters) which are recognized separately. The recognition result of a word is then the composition of the individually recognized parts. Inspired by results in cognitive psychology, researchers have begun to focus on holistic word recognition approaches. Here we present a holistic word recognition approach for single-author historical documents, which is motivated by the fact that for severely degraded documents a segmentation of words into characters will produce very poor results. The quality of the original documents does not allow us to recognize them with high accuracy - our goal here is to produce transcriptions that will allow successful retrieval of images, which has been shown to be feasible even in such noisy environments. We believe that this is the first systematic approach to recognizing words in historical manuscripts with extensive experiments. Our experiments show recognition accuracy of 65%, which exceeds performance of other systems which operate on non-degraded input images (nonhistorical documents).

conference on information and knowledge management | 2000

First story detection in TDT is hard

James Allan; Victor Lavrenko; Hubert Jin

✁✄✂✆☎✞✝✠✟☛✡✌☞✞✟☛✟✎✍✑✏✓✒✄✔✕✒ ✗✖✘✝✠✡✚✙✛✂✜✍ ✢✂✜✡✜✍✢✝✠✒✤✣✦✥✗✣✧☎★✔✪✩✫✥✬✡✮✭✯✝✠✣✘✰★✱✲✔ ✳✙✴✔✶✵ ✂✌✷✸✂✜✣✯✍☛✹✻✺✘✥✗✟☛✂ ✼☎✽✝✾✣❀✿❁✒✬✩✢❂❃✥✬✍✢✝✠✒✤✣❄✒✗✩✢✰✤✥✗✣✞✝✾❅✌✥✬✍✢✝✠✒✤✣ ❆✍✫✥✗✟☛✭✯✟✌❇❈✍☛✩✫✥✗✡✫✭❀✝✠✣✞✰ ✥✗✣✘☎ ❊❉✞✩✢✟ ❋✍❃✟❋✍✢✒✗✩☛●★☎❀✂✜✍✢✂✌✡✜✍✢✝✠✒✤✣■❍❏✁ ❑✂✆✟☛▲✘✒✼✏ ▼✍✢▲✧✥ ◆✍ ❖✏P▲✘✂✜✣◗✥❘❉✞✩✢✟❋✍ ✟❋✍✢✒✗✩☛●❙☎✞✂✜✍✢✂✌✡✮✍✢✝✾✒✗✣ ❚✟❋●❀✟❋✍✢✂✌❂❯✝✠✟❱✺✘✥✗✟☛✂✼☎❲☞✘✖❳✒✗✣✚✍☛✩✫✥✗✡✫✭❀✝✠✣✞✰❨✍✢✂✌✡✫▲✘✣✞✒✤❩❬✹ ✒✤✰✬●❪❭✤✏✓✂✶✡✌✥✗✣❖✂✜❫❀✖❳✂✌✡✮✍ ❴✍✢▲✘✥✬✍❴✖❳✂✜✩☛✿❵✒✗✩✢❂❃✥✗✣✞✡✌✂✳✏P✝✠❩✠❩✧✺❳✂❛✖❳✒❪✒✗✩✼❍❴✔✳▲✘✥✬✍ ✖✞✩✢✂✌☎✞✝✠✡✜✍✢✝✠✒✤✣❜✝✠✟✕✡✌✒✗✣✘✟☛✝✠✟❋✍✢✂✌✣❪✍ ❝✏P✝❬✍✢▲✎✥✬✡✜✍✢☞✧✥✬❩❪✖❳✂✜✩☛✿❁✒✬✩✢❂❃✥✗✣✞✡✌✂✓✝✠✣❜✔✳✙✴✔ ✂✌✷✗✥✗❩✠☞✧✥◆✍✢✝✠✒✤✣✘✟✌❍❞✁❑✂ ✍✢▲✘✂✜✣❢✟☛▲✞✒◆✏ ❣✍✢▲✘✥✬✍❛✍✢✒✎✥✗✡✫▲✘✝✠✂✌✷✸✂ ▲✘✝✠✰✤▲❀✹✑❤❪☞✘✥✗❩✠✝❬✍✑● ❉✘✩✢✟❋✍✪✟❋✍✢✒✗✩☛●✴☎✞✂✜✍✢✂✌✡✮✍✢✝✾✒✗✣✪❭◆✍☛✩✫✥✗✡✫✭✯✝✠✣✘✰❛✂✜✐❥✂✌✡✮✍✢✝✾✷✤✂✌✣✘✂✌✟☛✟✪❂✎☞✞✟❋✍✕✝✠❂❦✖✞✩✢✒◆✷✸✂ ✍✢✒✛✥ ☎✞✂✌✰✬✩✢✂✌✂✓✍✢▲✘✥✬✍❧✂✜❫❀✖❳✂✜✩✢✝✠✂✌✣✞✡✌✂✳✟☛☞✞✰✤✰✤✂✜✟❋✍✢✟❴✝✠✟❧☞✞✣✘❩✠✝✠✭✸✂✌❩❬●❪❍✪✁❑✂✳✡✌✒✗✣✞✹ ✡✌❩✠☞✧☎❀✂✶✍✢▲✘✥✬✍✓✂✜✐❥✂✌✡✜✍✢✝✠✷✸✂✶❉✘✩✢✟❋✍✳✟❋✍✢✒✬✩☛●❃☎✞✂✮✍✢✂✌✡✜✍✢✝✠✒✤✣❨✝✠✟✓✂✌✝❬✍✢▲✘✂✮✩❛✝✠❂❦✖❳✒✤✟❋✹ ✟☛✝✠✺✘❩✠✂✴✒✗✩P✩✢✂✼❤❪☞✞✝✠✩✢✂✜✟✶✟☛☞✞✺✘✟❋✍✫✥✬✣✯✍✢✝✾✥✬❩✾❩❬●✆☎✞✝❬✐❥✂✜✩✢✂✌✣❪✍✶✥✬✖✘✖✞✩✢✒✤✥✗✡✫▲✘✂✌✟✌❍

Polymer | 2001

Rheology and structure of isotactic polypropylene near the gel point: quiescent and shear-induced crystallization

Natalia V. Pogodina; Victor Lavrenko; Srivatsan Srinivas; H. Henning Winter

Abstract Crystallizing isotactic polypropylene (iPP) develops large-scale spherulites and thick threads, large enough for observation by optical microscopy, and undergoes a liquid-to-solid transition as an expression of increased connectivity of the structure. In order to relate the time scales of structural and rheological changes, we measured time-resolved small-angle light scattering (SALS) and transmittance properties in a single experimental run, which then was repeated in an optical microscope for direct observation of growth of large-scale structures, and in a rheometer for mechanical spectroscopy. The results for quiescent and shear-enhanced melt crystallization of a high molar mass iPP are presented. In quiescent crystallization, iPP nuclei are only formed in the beginning (and not in later stages) and grow simultaneously at the same rate, which leads to spherulites of equal size. The critical gel point is reached close to the instant of the maximum of density fluctuations, but before spherulites impinge. Crystallinity estimates from Hv SALS (estimation method of Stein) are much higher than values from DSC. The discrepancy may be caused (in addition to the simplifying assumptions in the estimate) by the enhanced crystallization in the rheo-optical cell due to surface and sample loading effects. Shear flow induces anisotropic molecular conformations, preferably in the high molecular weight component of the iPP sample. The resulting orientation fluctuations (of highly oriented long chains and less oriented short chains) cause (1) an increase in nucleation rate, (2) possibly an increase in crystallization rate and (3) formation of highly elongated structures (threads) which are visible in the optical microscope and in anisotropic SALS patterns. The threads thicken until, at later stages, additional spherulites start to grow, presumably from the shorter chains and nucleated on by the threads.

international acm sigir conference on research and development in information retrieval | 2004

A search engine for historical manuscript images

Toni M. Rath; R. Manmatha; Victor Lavrenko

Many museum and library archives are digitizing their large collections of handwritten historical manuscripts to enable public access to them. These collections are only available in image formats and require expensive manual annotation work for access to them. Current handwriting recognizers have word error rates in excess of 50% and therefore cannot be used for such material. We describe two statistical models for retrieval in large collections of handwritten manuscripts given a text query. Both use a set of transcribed page images to learn a joint probability distribution between features computed from word images and their transcriptions. The models can then be used to retrieve unlabeled images of handwritten documents given a text query. We show experiments with a training set of 100 transcribed pages and a test set of 987 handwritten page images from the George Washington collection. Experiments show that the precision at 20 documents is about 0.4 to 0.5 depending on the model. To the best of our knowledge, this is the first automatic retrieval system for historical manuscripts using text queries, without manual transcription of the original corpus.

conference on information and knowledge management | 2000

Language models for financial news recommendation

Victor Lavrenko; Matthew D. Schmill; Dawn J. Lawrie; Paul Ogilvie; David D. Jensen; James Allan

ABSTRACT We present a unique approa h to identifying news stories that in uen e the behavior of nan ial markets. Spe i ally, we des ribe the design and implementation of nalyst, a system that an re ommend interesting news stories { stories that are likely to a e t market behavior. nalyst operates by orrelating the ontent of news stories with trends in nan ial time series. We identify trends in time series using pie ewise linear tting and then assign labels to the trends a ording to an automated binning pro edure. We use language models to represent patterns of language that are highly asso iated with parti ular labeled trends. nalyst an then identify and re ommend news stories that are highly indi ative of future trends. We evaluate the system in terms of its ability to re ommend the stories that will a e t the behavior of the sto k market. We demonstrate that stories re ommended by nalyst ould be used to pro tably predi t forth oming trends in sto k pri es.

Explore More