C. J. van Rijsbergen
University of Glasgow
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by C. J. van Rijsbergen.
Information Processing and Management | 2002
Anastasios Tombros; Robert Villa; C. J. van Rijsbergen
Hierarchic document clustering has been widely applied to information retrieval (IR) on the grounds of its potential improved effectiveness over inverted file search (IFS). However, previous research has been inconclusive as to whether clustering does bring improvements. In this paper we take the view that if hierarchic clustering is applied to search results (query-specific clustering), then it has the potential to increase the retrieval effectiveness compared both to that of static clustering and of conventional IFS. We conducted a number of experiments using five document collections and four hierarchic clustering methods. Our results show that the effectiveness of query-specific clustering is indeed higher, and suggest that there is scope for its application to IR.
ACM Transactions on Information Systems | 2005
Ryen W. White; Ian Ruthven; Joemon M. Jose; C. J. van Rijsbergen
In this article we describe an evaluation of relevance feedback (RF) algorithms using searcher simulations. Since these algorithms select additional terms for query modification based on inferences made from searcher interaction, not on relevance information searchers explicitly provide (as in traditional RF), we refer to them as implicit feedback models. We introduce six different models that base their decisions on the interactions of searchers and use different approaches to rank query modification terms. The aim of this article is to determine which of these models should be used to assist searchers in the systems we develop. To evaluate these models we used searcher simulations that afforded us more control over the experimental conditions than experiments with human subjects and allowed complex interaction to be modeled without the need for costly human experimentation. The simulation-based evaluation methodology measures how well the models learn the distribution of terms across relevant documents (i.e., learn what information is relevant) and how well they improve search effectiveness (i.e., create effective search queries). Our findings show that an implicit feedback model based on Jeffreys rule of conditioning outperformed other models under investigation.
acm conference on hypertext | 1993
Mark D. Dunlop; C. J. van Rijsbergen
Abstract This paper discusses aspects of multimedia document bases and how access to documents held on a computer-based system can be achieved; in particular, the current access methods of hypermedia and free text information retrieval are discussed. Browsing-based hypermedia systems provide ease of use for novice users and equal access to any media; however, they typically perform poorly with very large document bases. In contrast, query-based free text retrieval systems are typically designed to work with very large document bases, but have very poor multimedia capabilities. This paper presents a hybrid between these two traditional fields of information retrieval, together with a technique for using contextual information to provide access, through query, to documents that cannot be accessed by content (e.g., images). Two experiments are then presented that were carried out to test this approach. Finally, the paper gives a brief discussion of a prototype implementation, which provides access to mixed media information by query or browsing, and user-interface issues are discussed.
international symposium on neural networks | 2004
L. Azzopardi; Mark A. Girolami; C. J. van Rijsbergen
We propose a topic based approach to language modelling for ad-hoc information retrieval (IR). Many smoothed estimators used for the multinomial query model in IR rely upon the estimated background collection probabilities. In this paper, we propose a topic based language modelling approach, that uses a more informative prior based on the topical content of a document. In our experiments, the proposed model provides comparable IR performance to the standard models, but when combined in a two stage language model, it outperforms all other estimated models.
international acm sigir conference on research and development in information retrieval | 1988
Alan F. Smeaton; C. J. van Rijsbergen
Traditional information has relied on the extensive use of statistical parameters in the implementation of retrieval strategies. This paper sets out to investigate whether linguistic processes can be used as part of a document retrieval strategy. This is done by predefining a level of syntactic analysis of user queries only, to be used as part of the retrieval process. A large series of experiments on an experimental test collection are reported which use a parser for noun phrases as part of the retrieval strategy. The results obtained from the experiments do yield improvements in the level of retrieval effectiveness and given the crude linguistic process used and the way it was used on queries and not on document texts, suggests that the approach of using linguistic processing in retrieval, is valid.
european conference on information retrieval | 2004
Ryen W. White; Joemon M. Jose; C. J. van Rijsbergen; Ian Ruthven
In this paper we report on a study of implicit feedback models for unobtrusively tracking the information needs of searchers. Such models use relevance information gathered from searcher interaction and can be a potential substitute for explicit relevance feedback. We introduce a variety of implicit feedback models designed to enhance an Information Retrieval (IR) system’s representation of searchers’ information needs. To benchmark their performance we use a simulation-centric evaluation methodology that measures how well each model learns relevance and improves search effectiveness. The results show that a heuristic-based binary voting model and one based on Jeffrey’s rule of conditioning [5] outperform the other models under investigation.
Information Processing and Management | 1990
Tengku M.T. Sembok; C. J. van Rijsbergen
Abstract This paper introduces a logical-linguistic model of document retrieval systems and describes an implementation of a system called SILOL which is based on this model. SILOL uses a shallow semantic translation of natural language texts into a first order predicate representation in performing a document indexing and retrieval process. Some preliminary experiments have been carried out to test the retrieval effectiveness of this system. The results obtained show improvements in the level of retrieval effectiveness, which demonstrate that the approach of using a semantic theory of natural language and logic in document retrieval systems is a valid one.
Journal of the Association for Information Science and Technology | 1996
C. J. van Rijsbergen; Mounia Lalmas
Information is and always has been an elusive concept; nevertheless many philosophers, mathematicians, logicians and computer scientists have felt that it is fundamental. Many attempts have been made to come up with some sensible and intuitively acceptable definition of information; up to now, none of these have succeeded. This work is based on the approach followed by Dretske, Barwise, and Devlin, who claimed that the notion of information starts from the position that given an ontology of objects individuated by a cognitive agent, it makes sense to speak of the information an object (e.g., a text, an image, a video) contains about another object (e.g., the query). This phenomenon is captured by the flow of information between objects. Its exploitation is the task of an Information Retrieval system. These authors proposed a theory of information that provides an analysis of the concept of information (any type, from any media) and the manner in which intelligent organisms (referred to as cognitive agents) handle and respond to the information picked up from their environment. They defined the nature of information flow and the mechanisms that give rise to such a flow. The theory, which is based on Situation Theory, is expressed with a calculus defined on channels. The calculus was defined so that it satisfies properties that are attributed to information and its flows. This paper demonstrates the connection between this calculus and Information Retrieval, and proposes a model of an Information Retrieval system based on this calculus.
Knowledge and Information Systems | 2004
Anastasios Tombros; C. J. van Rijsbergen
The application of document clustering to information retrieval has been motivated by the potential effectiveness gains postulated by the cluster hypothesis. The hypothesis states that relevant documents tend to be highly similar to each other and therefore tend to appear in the same clusters. In this paper we propose an axiomatic view of the hypothesis by suggesting that documents relevant to the same query (co-relevant documents) display an inherent similarity to each other that is dictated by the query itself. Because of this inherent similarity, the cluster hypothesis should be valid for any document collection. Our research describes an attempt to devise means by which this similarity can be detected. We propose the use of query-sensitive similarity measures that bias interdocument relationships toward pairs of documents that jointly possess attributes expressed in a query. We experimentally tested three query-sensitive measures against conventional ones that do not take the query into account, and we also examined the comparative effectiveness of the three query-sensitive measures. We calculated interdocument relationships for varying numbers of top-ranked documents for six document collections. Our results show a consistent and significant increase in the number of relevant documents that become nearest neighbors of any given relevant document when query-sensitive measures are used. These results suggest that the effectiveness of a cluster-based information retrieval system has the potential to increase through the use of query-sensitive similarity measures.
international acm sigir conference on research and development in information retrieval | 1995
Fabio Crestani; C. J. van Rijsbergen
In this paper we discuss the dynamics of probabilistic term weights in different IR retrieval models. We present four different models based on different notions of retrieval. Two of these models are classical probabilistic models long in use in IR, the two others are based on a logical technique of evaluating the probability of a conditional called Imaging, one is a generalisation of the other. We aualyse the transfer of probabilities occuring in the ~epresentation space at retrieval time for these four models, compare their retrieval performance using classical test collections, and discuss the results.