Paul Ogilvie | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Paul Ogilvie is active.

Explore More

Publication

Featured researches published by Paul Ogilvie.

international acm sigir conference on research and development in information retrieval | 2003

Combining document representations for known-item search

Paul Ogilvie; James P. Callan

This paper investigates the pre-conditions for successful combination of document representations formed from structural markup for the task of known-item search. As this task is very similar to work in meta-search and data fusion, we adapt several hypotheses from those research areas and investigate them in this context. To investigate these hypotheses, we present a mixture-based language model and also examine many of the current meta-search algorithms. We find that compatible output from systems is important for successful combination of document representations. We also demonstrate that combining low performing document representations can improve performance, but not consistently. We find that the techniques best suited for this task are robust to the inclusion of poorly performing document representations. We also explore the role of variance of results across systems and its impact on the performance of fusion, with the surprising result that the correct documents have higher variance across document representations than highly ranking incorrect documents.

conference on information and knowledge management | 2002

A language modeling framework for resource selection and results merging

Luo Si; Rong Jin; James P. Callan; Paul Ogilvie

Statistical language models have been proposed recently for several information retrieval tasks, including the resource selection task in distributed information retrieval. This paper extends the language modeling approach to integrate resource selection, ad-hoc searching, and merging of results from different text databases into a single probabilistic retrieval model. This new approach is designed primarily for Intranet environments, where it is reasonable to assume that resource providers are relatively homogeneous and can adopt the same kind of search engine. Experiments demonstrate that this new, integrated approach is at least as effective as the prior state-of-the-art in distributed IR.

acm international conference on digital libraries | 2000

Acrophile: an automated acronym extractor and server

Leah S. Larkey; Paul Ogilvie; M. Andrew Price; Brenden Tamilio

We implemented a web server for acronym and abbreviation lookup, containing a collection of acronyms and their expansions gathered from a large number of web pages by a heuristic extraction process. Several different extraction algorithms were evaluated and compared. The corpus resulting from the best algorithm is comparable to a high-quality hand-crafted site, but has the potential to be much more inclusive as data from more web pages are processed.

conference on information and knowledge management | 2000

Language models for financial news recommendation

Victor Lavrenko; Matthew D. Schmill; Dawn J. Lawrie; Paul Ogilvie; David D. Jensen; James Allan

ABSTRACT We present a unique approa h to identifying news stories that in uen e the behavior of nan ial markets. Spe i ally, we des ribe the design and implementation of nalyst, a system that an re ommend interesting news stories { stories that are likely to a e t market behavior. nalyst operates by orrelating the ontent of news stories with trends in nan ial time series. We identify trends in time series using pie ewise linear tting and then assign labels to the trends a ording to an automated binning pro edure. We use language models to represent patterns of language that are highly asso iated with parti ular labeled trends. nalyst an then identify and re ommend news stories that are highly indi ative of future trends. We evaluate the system in terms of its ability to re ommend the stories that will a e t the behavior of the sto k market. We demonstrate that stories re ommended by nalyst ould be used to pro tably predi t forth oming trends in sto k pri es.

international acm sigir conference on research and development in information retrieval | 2007

Structured retrieval for question answering

Matthew W. Bilotti; Paul Ogilvie; Jamie Callan; Eric Nyberg

Bag-of-words retrieval is popular among Question Answering (QA) system developers, but it does not support constraint checking and ranking on the linguistic and semantic information of interest to the QA system. We present anapproach to retrieval for QA, applying structured retrieval techniques to the types of text annotations that QA systems use. We demonstrate that the structured approach can retrieve more relevant results, more highly ranked, compared with bag-of-words, on a sentence retrieval task. We also characterize the extent to which structured retrieval effectiveness depends on the quality of the annotations.

INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval | 2004

Hierarchical language models for XML component retrieval

Paul Ogilvie; Jamie Callan

Experiments using hierarchical language models for XML component retrieval are presented in this paper. The role of context is investigated through incorporation of the parents model. We find that context can improve the effectiveness of finding relevant components slightly. Additionally, biasing the results toward long components through the use of component priors improves exhaustivity but harms specificity, so care must be taken to find an appropriate trade-off.

Information Retrieval | 2009

On the number of terms used in automatic query expansion

Paul Ogilvie; Ellen M. Voorhees; Jamie Callan

This paper investigates the number of expansion terms to use in automatic query expansion by examining the behavior of eight retrieval systems participating in the NRRC Reliable Information Access Workshop. The results demonstrate that current systems are able to obtain nearly all of the benefit of using a fixed number of expansion terms per topic, but significant additional improvement is possible if systems were able to accurately select the best number of expansion terms on a per topic basis. When optimizing average effectiveness as measured by mean average precision, using a fixed number of terms increases the score a large amount for a small number of topics but has little effect for most topics. The analysis further suggests that when a topic is helped by automatic feedback, the increase is from a set of terms that reinforce each other rather than from the system finding a single excellent term.

conference on information and knowledge management | 2001

The effectiveness of query expansion for distributed information retrieval

Paul Ogilvie; James P. Callan

Query expansion has been shown effective for both single database retrieval and for distributed information retrieval where complete collection information is available. One might expect that query expansion would then work for distributed information retrieval when complete collection information is not available. However, this does not appear to be the case. When using local context analysis for query expansion in distributed retrieval with partial information, the most significant reason query expansion does not work is that merging scores of documents retrieved by expanded queries is very difficult. However, we have found that using sampled information for query expansion can give boosts in a single database environment, and that when more information is available, query expansion can work in distributed environments. We also show that most of the benefit of query expansion in distributed retrieval comes from finding good documents, and not from selecting good databases.

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval | 2005

Parameter estimation for a simple hierarchical generative model for XML retrieval

Paul Ogilvie; Jamie Callan

This paper explores the possibility of using a modified Expectation-Maximization algorithm to estimate parameters for a simple hierarchical generative model for XML retrieval. The generative model for an XML element is estimated by linearly interpolating statistical language models estimated from the text of the element, the parent element, the document element, and its children elements. We heuristically modify EM to allow the incorporation of negative examples, then attempt to maximize the likelihood of the relevant components while minimizing the likelihood of non-relevant components found in training data. The technique for incorporation of negative examples provide an effective algorithm to estimate the parameters in the linear combination mentioned. Some experiments are presented on the CO.Thorough task that support these claims.

conference on information and knowledge management | 2006

Investigating the exhaustivity dimension in content-oriented XML element retrieval evaluation

Paul Ogilvie; Mounia Lalmas

INEX, the evaluation initiative for content-oriented XML retrieval, has since its establishment defined the relevance of an element according to two graded dimensions, exhaustivity and specificity. The former measures how exhaustively an XML element discusses the topic of request, whereas specificity measures how focused the element is on the topic of request. The reason for having two dimensions was to provide a more stable measure of relevance than if assessors were asked to rate the relevance of an element on a single scale. However, obtaining relevance assessments is a costly task. as each document must be assessed for relevance by a human assessor. In XML retrieval this problem is exacerbated as the elements of the document must also be assessed with respect to the exhaustivity and specificity dimensions. A continuous discussion in INEX has been whether such a sophisticated definition of relevance, and in particular the exhaustivity dimension, was needed. This paper attempts to answer this question through extensive statistical tests to compare the conclusions about system performance that could be made under different assessment scenarios.

Explore More