Richard A. O'Keefe | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Richard A. O'Keefe is active.

Explore More

Publication

Featured researches published by Richard A. O'Keefe.

conference on information and knowledge management | 2013

Maintaining discriminatory power in quantized indexes

Matt Crane; Andrew Trotman; Richard A. O'Keefe

The time cost of searching with an inverted index is directly proportional to the number of postings processed and the cost of processing each posting. Dynamic pruning reduces the number of postings examined. Pre-calculation then quantization of term / document weights reduces the cost of evaluating each posting. The effect of quantization on precision, latency, and index size is examined herein. We show empirically that there is an ideal size (in bits) for storing the quantized scores. Increasing this adversely affects index size and search latency; decreasing it adversely affects precision. We observe a relationship between the collection size and ideal quantization size, and provide a way to determine the number of bits to use from the collection size.

parallel and distributed computing: applications and technologies | 2008

Application-Specific Disk I/O Optimisation for a Search Engine

Xiang-Fei Jia; Andrew Trotman; Richard A. O'Keefe; Zhiyi Huang

Operating systems only provide general-purpose I/O optimisation since they have to service various types of applications. However, application level I/O optimisation can achieve better performance since an application has a better knowledge of how to optimise disk I/O for the application. In this paper we provide a solution for application-specific I/O for optimising a search engine. It shows a 28% improvement when compared to the general-purpose I/O optimisation of Linux. Our result also shows a 11% improvement when the Linux I/O optimisation is bypassed.

australasian document computing symposium | 2012

A study in language identification

Rachel Mary Milne; Richard A. O'Keefe; Andrew Trotman

Language identification is automatically determining the language that a previously unseen document was written in. We compared several prior methods on samples from the Wikipedia and the EuroParl collections. Most of these methods work well. But we identify that these (and presumably other document) collections are heterogeneous in size, and short documents are systematically different from large ones. That techniques that work well on long documents are different from those that work well on short ones. We believe that improvement in algorithms will be seen if length is taken into account.

INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval | 2004

If INEX is the answer, what is the question?

Richard A. O'Keefe

The INEX query languages allow the extraction of fragments from selected documents. This power is not much used in INEX queries. The paper suggests reasons why, and considers which kind of document collection this feature might be useful for.

Theory and Practice of Logic Programming | 2001

O(1) reversible tree navigation without cycles

Richard A. O'Keefe

Imperative programmers often use cyclically linked trees in order to achieve O(1) navigation time to neighbours. Some logic programmers believe that cyclic terms are necessary to achieve the same in logic-based languages. An old but little-known technique provides O(1) time and space navigation without cyclic links, in the form of reversible predicates. A small modification provides O(1) amortised time and space editing.

australasian joint conference on artificial intelligence | 2012

Inference of a phylogenetic tree: hierarchical clustering versus genetic algorithm

Glenn Blanchette; Richard A. O'Keefe; Lubica Benuskova

This paper compares the implementations and performance of two computational methods, hierarchical clustering and a genetic algorithm, for inference of phylogenetic trees in the context of the artificial organism Caminalcules. Although these techniques have a superficial similarity, in that they both use agglomeration as their construction method, their origin and approaches are antithetical. For a small problem space of the original species proposed by Camin (1965) the genetic algorithm was able to produce a solution which had a lower Fitch cost and was closer to the theoretical evolution of Caminalcules. Unfortunately for larger problem sizes its time cost increased exponentially making the greedy directed search of the agglomerative clustering algorithm a more efficient approach.

australasian document computing symposium | 2013

Malformed UTF-8 and spam

Matt Crane; Andrew Trotman; Richard A. O'Keefe

In this paper we discuss some of the document encoding errors that were found when scaling our indexer and search engine up to large collections crawled from the web, such as ClueWeb09. In this paper we describe the encoding errors, what effect they could have on indexing and searching, how they are processed within our indexer and search engine and how they relate to the quality of the page measured by another method.

Theory and Practice of Logic Programming | 2012

Coding guidelines for prolog

Michael A. Covington; Roberto Bagnara; Richard A. O'Keefe; Jan Wielemaker; Simon Price

Coding standards and good practices are fundamental to a disciplined approach to software projects irrespective of programing languages being employed. Prolog programing can benefit from such an approach, perhaps more than programing in other languages. Despite this, no widely accepted standards and practices seem to have emerged till now. The present paper is a first step toward filling this void: It provides immediate guidelines for code layout, naming conventions, documentation, proper use of Prolog features, program development, debugging, and testing. Presented with each guideline is its rationale and, where sensible options exist, illustrations of the relative pros and cons for each alternative. A coding standard should always be selected on a per-project basis, based on a host of issues pertinent to any given programing project; for this reason the paper goes beyond the mere provision of normative guidelines by discussing key factors and important criteria that should be taken into account when deciding on a full-fledged coding standard for the project.

international conference on intelligent information processing | 2006