Paul B. Thistlewaite | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Paul B. Thistlewaite is active.

Explore More

Publication

Featured researches published by Paul B. Thistlewaite.

international world wide web conferences | 1999

Results and challenges in Web search evaluation

David Hawking; Nick Craswell; Paul B. Thistlewaite; Donna Harman

Abstract A frozen 18.5 million page snapshot of part of the Web has been created to enable and encourage meaningful and reproducible evaluation of Web search systems and techniques. This collection is being used in an evaluation framework within the Text Retrieval Conference (TREC) and will hopefully provide convincing answers to questions such as, “Can link information result in better rankings?”, “Do longer queries result in better answers?”, and, “Do TREC systems work well on Web data?” The snapshot and associated evaluation methods are described and an invitation is extended to participate. Preliminary results are presented for an effectivess comparison of six TREC systems working on the snapshot collection against five well-known Web search systems working over the current Web. These suggest that the standard of document rankings produced by public Web search engines is by no means state-of-the-art.

ACM Transactions on Information Systems | 1999

Methods for information server selection

David Hawking; Paul B. Thistlewaite

The problem of using a broker to select a subset of available information servers in order to achieve a good trade-off between document retrieval effectiveness and cost is addressed. Server selection methods which are capable of operating in the absence of global information, and where servers have no knowledge of brokers, are investigated. A novel method using Lightweight Probe queries (LWP method) is compared with several methods based on data from past query processing, while Random and Optimal server rankings serve as controls. Methods are evaluated, using TREC data and relevance judgments, by computing ratios, both empirical and ideal, of recall and early precision for the subset versus the complete set of available servers. Estimates are also made of the best-possible performance of each of the methods. LWP and Topic Similarity methods achieved best results, each being capable of retrieving about 60% of the relevant documents for only one-third of the cost of querying all servers. Subject to the applicable cost model, the LWP method is likely to be preferred because it is suited to dynamic environments. The good results obtained with a simple automatic LWP implementation were replicated using different data and a larger set of query topics.

acm conference on hypertext | 1997

Automatic construction and management of large open webs

Paul B. Thistlewaite

Abstract Many researchers have noted the problems associated with manually created or maintained hyperdocument links, and the consequent need for automated methods. A number of techniques have been applied to the problem, including pattern-matching, information retrieval, and natural language processing. This paper describes a system for the automatic detection and management of structural and referential links. The paper also addresses the issues of link-set soundness and completeness, open link management, and the particular problems engendered by large volatile hyperbases.

Information Retrieval | 1999

Scaling Up the TREC Collection

David Hawking; Paul B. Thistlewaite; Donna Harman

Due to the popularity of Web search engines, a large proportion of real text retrieval queries are now processed over collections measured in tens or hundreds of gigabytes. A new Very Large test Collection (VLC) has been created to support qualification, measurement and comparison of systems operating at this level and to permit the study of the properties of very large collections. The VLC is an extension of the well-known TREC collection and has been distributed under the same conditions. A simple set of efficiency and effectiveness measures have been defined to encourage comparability of reporting. The 20 gigabyte first-edition of the VLC and a representative 10% sample have been used in a special interest track of the 1997 Text Retrieval Conference (TREC-6). The unaffordable cost of obtaining complete relevance assessments over collections of this scale is avoided by concentrating on early precision and relying on the core TREC collection to support detailed effectiveness studies. Results obtained by TREC-6 VLC track participants are presented here. All groups observed a significant increase in early precision as collection size increased. Explanatory hypotheses are advanced for future empirical testing. A 100 gigabyte second edition VLC (VLC2) has recently been compiled and distributed for use in TREC-7 in 1998.

conference on automated deduction | 1988

Towards Efficient Knowledge-Based Automated Theorem Proving for Non-Standard Logics

Michael A. McRobbie; Robert K. Meyer; Paul B. Thistlewaite

In this paper we give an introduction to a technique for greatly increasing the efficiency of automated theorem provers for non-standard logics. This technique takes advantage of the fact that while most important non-standard logics do not have finite characteristic models (in the sense that truth tables are a finite characteristic model for classical propositional logic), they do have finite models. These models validate all the theorems of a given logic, though some non-theorems as well. They invalidate the rest of the non-theorems. Hence this technique involves using the models to direct a search by an automated theorem prover for a proof by filtering out or pruning the items of the search space which the models invalidate.

Journal of Automated Reasoning | 1991

Approaching hard non-classical problems

Paul B. Thistlewaite; Michael A. McRobbie

The introduction to this issue mentions three approaches that might be taken to non-classical theorem proving. During development of theorem-proving techniques for the non-classical relevant logic LR, reported in Thistlewaite, McRobbie and Meyer [1], the second approach was taken namely, a Gentzen-based deductive formulation of the logic was implemented. The hardest set of problems decided in this way involved proving that certain formulas in fact define binary connectives that are associative in LR. The motivation for doing so was to show the relevant logic R undecidable by defining an appropriately free associative connective within R to act like the desired semigroup operation in the manner of Post. This work was interrupted by Alistair Urquharts proof of the undecidability of R using techniques from projective geometry. Nonetheless, we believe that this set of problems probably marks the limits of competence of prooftheoretic theorem provers for LR. An unresolved question is whether either of the other two approaches to non-classical ATP might prove more efficacious for LR, and by implication, perhaps for non-classical logics generally. A first step towards resolving this question would be proving, using either of the other two approaches, that the set of formulas (F1) to (Fl6) below, define binary connectives that are associative in LR.

text retrieval conference | 1995