Nicholas Lester | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nicholas Lester is active.

Explore More

Publication

Featured researches published by Nicholas Lester.

conference on information and knowledge management | 2005

Fast on-line index construction by geometric partitioning

Nicholas Lester; Alistair Moffat; Justin Zobel

Inverted index structures are the mainstay of modern text retrieval systems. They can be constructed quickly using off-line merge-based methods, and provide efficient support for a variety of querying modes. In this paper we examine the task of on-line index construction -- that is, how to build an inverted index when the underlying data must be continuously queryable, and the documents must be indexed and available for search as soon they are inserted. When straightforward approaches are used, document insertions become increasingly expensive as the size of the database grows. This paper describes a mechanism based on controlled partitioning that can be adapted to suit different balances of insertion and querying operations, and is faster and scales better than previous methods. Using experiments on 100GB of web data we demonstrate the efficiency of our methods in practice, showing that they dramatically reduce the cost of on-line index construction.

Information Processing and Management | 2006

Efficient online index maintenance for contiguous inverted lists

Nicholas Lester; Justin Zobel; Hugh E. Williams

Search engines and other text retrieval systems use high-performance inverted indexes to provide efficient text query evaluation. Algorithms for fast query evaluation and index construction are well-known, but relatively little has been published concerning update. In this paper, we experimentally evaluate the two main alternative strategies for index maintenance in the presence of insertions, with the constraint that inverted lists remain contiguous on disk for fast query evaluation. The in-place and re-merge strategies are benchmarked against the baseline of a complete re-build. Our experiments with large volumes of web data show that re-merge is the fastest approach if large buffers are available, but that even a simple implementation of in-place update is suitable when the rate of insertion is low or memory buffer size is limited. We also show that with careful design of aspects of implementation such as free-space management, in-place update can be improved by around an order of magnitude over a naive implementation.

ACM Transactions on Database Systems | 2008

Efficient online index construction for text databases

Nicholas Lester; Alistair Moffat; Justin Zobel

Inverted index structures are a core element of current text retrieval systems. They can be constructed quickly using offline approaches, in which one or more passes are made over a static set of input data, and, at the completion of the process, an index is available for querying. However, there are search environments in which even a small delay in timeliness cannot be tolerated, and the index must always be queryable and up to date. Here we describe and analyze a geometric partitioning mechanism for online index construction that provides a range of tradeoffs between costs, and can be adapted to different balances of insertion and querying operations. Detailed experimental results are provided that show the extent of these tradeoffs, and that these new methods can yield substantial savings in online indexing costs.

web information systems engineering | 2005

Space-Limited ranked query evaluation using adaptive pruning

Nicholas Lester; Alistair Moffat; William Webber; Justin Zobel

Evaluation of ranked queries on large text collections can be costly in terms of processing time and memory space. Dynamic pruning techniques allow both costs to be reduced, at the potential risk of decreased retrieval effectiveness. In this paper we describe an improved query pruning mechanism that offers a more resilient tradeoff between query evaluation costs and retrieval effectiveness than do previous pruning approaches.

international conference on data mining | 2004

Scalable multi-relational association mining

Amanda Clare; Hugh E. Williams; Nicholas Lester

We propose the RADAR technique for multirelational data mining. This permits the mining of very large collections and provides a technique for discovering multirelational associations. Results show that RADAR is reliable and scalable for mining a large yeast homology collection, and that it does not have the main-memory scalability constraints of the Farmer and Warmr tools.

ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26 | 2004