Maarten de Rijke | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Maarten de Rijke is active.

Explore More

Publication

Featured researches published by Maarten de Rijke.

international acm sigir conference on research and development in information retrieval | 2006

Formal models for expert finding in enterprise corpora

Krisztian Balog; Leif Azzopardi; Maarten de Rijke

Searching an organizations document repositories for experts provides a cost effective solution for the task of expert finding. We present two general strategies to expert searching given a document collection which are formalized using generative probabilistic models. The first of these directly models an experts knowledge based on the documents that they are associated with, whilst the second locates documents on topic, and then finds the associated expert. Forming reliable associations is crucial to the performance of expert finding systems. Consequently, in our evaluation we compare the different approaches, exploring a variety of associations along with other operational parameters (such as topicality). Using the TREC Enterprise corpora, we show that the second strategy consistently outperforms the first. A comparison against other unsupervised techniques, reveals that our second model delivers excellent performance.

web search and data mining | 2012

Adding semantics to microblog posts

Edgar Meij; Wouter Weerkamp; Maarten de Rijke

Microblogs have become an important source of information for the purpose of marketing, intelligence, and reputation management. Streams of microblogs are of great value because of their direct and real-time nature. Determining what an individual microblog post is about, however, can be non-trivial because of creative language usage, the highly contextualized and informal nature of microblog posts, and the limited length of this form of communication. We propose a solution to the problem of determining what a microblog post is about through semantic linking: we add semantics to posts by automatically identifying concepts that are semantically related to it and generating links to the corresponding Wikipedia articles. The identified concepts can subsequently be used for, e.g., social media mining, thereby reducing the need for manual inspection and selection. Using a purpose-built test collection of tweets, we show that recently proposed approaches for semantic linking do not perform well, mainly due to the idiosyncratic nature of microblog posts. We propose a novel method based on machine learning with a set of innovative features and show that it is able to achieve significant improvements over all other methods, especially in terms of precision.

Lecture Notes in Computer Science | 2007

ENSM-SE at CLEF 2006 : Fuzzy Proximity Method with an Adhoc Influence Function in Evaluation of Multilingual and Multi-modal Information Retrieval 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, Alicante, Spain

Carol Peters; Paul D. Clough; Fredric C. Gey; Jussi Karlgren; Bernardo Magnini; Douglas W. Oard; Maarten de Rijke; Maximilian Stempfhuber

This book constitutes the thoroughly refereed postproceedings of the 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, held in Alicante, Spain, September 2006. The revised papers presented together with an introduction were carefully reviewed and selected for inclusion in the book. The papers are organized in topical sections on Multilingual Textual Document Retrieval, Domain-Specifig Information Retrieval, i-CLEF, QA@CLEF, ImageCLEF, CLSR, WebCLEF and GeoCLEF.We experiment a new influence function in our information retrieval method that uses the degree of fuzzy proximity of key terms in a document to compute the relevance of the document to the query. The model is based on the idea that the closer the query terms in a document are to each other the more relevant the document. Our model handles Boolean queries but, contrary to the traditional extensions of the basic Boolean information retrieval model, does not use a proximity operator explicitly. A single parameter makes it possible to control the proximity degree required. To improve our system we use a stemming algorithm before indexing, we take a specific influence function and we merge fuzzy proximity result lists built with different width of influence function. We explain how we construct the queries and report the results of our experiments in the ad-hoc monolingual French task of the CLEF 2006 evaluation campaign.

european conference on information retrieval | 2006

A study of blog search

Gilad Mishne; Maarten de Rijke

We present an analysis of a large blog search engine query log, exploring a number of angles such as query intent, query topics, and user sessions. Our results show that blog searches have different intents than general web searches, suggesting that the primary targets of blog searchers are tracking references to named entities, and locating blogs by theme. In terms of interest areas, blog searchers are, on average, more engaged in technology, entertainment, and politics than web searchers, with a particular interest in current events. The user behavior observed is similar to that in general web search: short sessions with an interest in the first few results only.

Information Processing and Management | 2009

A language modeling framework for expert finding

Krisztian Balog; Leif Azzopardi; Maarten de Rijke

Statistical language models have been successfully applied to many information retrieval tasks, including expert finding: the process of identifying experts given a particular topic. In this paper, we introduce and detail language modeling approaches that integrate the representation, association and search of experts using various textual data sources into a generative probabilistic framework. This provides a simple, intuitive, and extensible theoretical framework to underpin research into expertise search. To demonstrate the flexibility of the framework, two search strategies to find experts are modeled that incorporate different types of evidence extracted from the data, before being extended to also incorporate co-occurrence information. The models proposed are evaluated in the context of enterprise search systems within an intranet environment, where it is reasonable to assume that the list of experts is known, and that data to be mined is publicly accessible. Our experiments show that excellent performance can be achieved by using these models in such environments, and that this theoretical and empirical work paves the way for future principled extensions.

european conference on information retrieval | 2011

Incorporating query expansion and quality indicators in searching microblog posts

Kamran Massoudi; Manos Tsagkias; Maarten de Rijke; Wouter Weerkamp

We propose a retrieval model for searching microblog posts for a given topic of interest. We develop a language modeling approach tailored to microblogging characteristics, where redundancy-based IR methods cannot be used in a straightforward manner. We enhance this model with two groups of quality indicators: textual and microblog specific. Additionally, we propose a dynamic query expansion model for microblog post retrieval. Experimental results on Twitter data reveal the usefulness of boolean search, and demonstrate the utility of quality indicators and query expansion in microblog search

Foundations and Trends in Information Retrieval archive | 2012

Expertise Retrieval

Krisztian Balog; Yi Fang; Maarten de Rijke; Pavel Serdyukov; Luo Si

People have looked for experts since before the advent of computers. With advances in information retrieval technology and the large-scale availability of digital traces of knowledge-related activities, computer systems that can fully automate the process of locating expertise have become a reality. The past decade has witnessed tremendous interest, and a wealth of results, in expertise retrieval as an emerging subdiscipline in information retrieval. This survey highlights advances in models and algorithms relevant to this field. We draw connections among methods proposed in the literature and summarize them in five groups of basic approaches. These serve as the building blocks for more advanced models that arise when we consider a range of content-based factors that may impact the strength of association between a topic and a person. We also discuss practical aspects of building an expert search system and present applications of the technology in other domains, such as blog distillation and entity retrieval. The limitations of current approaches are also pointed out. We end our survey with a set of conjectures on what the future may hold for expertise retrieval research.

Proceedings of the 3rd international workshop on Link discovery | 2005

Discovering missing links in Wikipedia

Sisay Fissaha Adafre; Maarten de Rijke

In this paper we address the problem of discovering missing hypertext links in Wikipedia. The method we propose consists of two steps: first, we compute a cluster of highly similar pages around a given page, and then we identify candidate links from those similar pages that might be missing on the given page. The main innovation is in the algorithm that we use for identifying similar pages, LTRank, which ranks pages using co-citation and page title information. Both LTRank and the link discovery method are manually evaluated and show acceptable results, especially given the simplicity of the methods and conservativeness of the evaluation criteria.

conference of the european chapter of the association for computational linguistics | 2006

Why are they excited?: identifying and explaining spikes in blog mood levels

Krisztian Balog; Gilad Mishne; Maarten de Rijke

We describe a method for discovering irregularities in temporal mood patterns appearing in a large corpus of blog posts, and labeling them with a natural language explanation. Simple techniques based on comparing corpus frequencies, coupled with large quantities of data, are shown to be effective for identifying the events underlying changes in global moods.

international acm sigir conference on research and development in information retrieval | 2007

Broad expertise retrieval in sparse data environments

Krisztian Balog; Toine Bogers; Leif Azzopardi; Maarten de Rijke; Antal van den Bosch

Expertise retrieval has been largely unexplored on data other than the W3C collection. At the same time, many intranets of universities and other knowledge-intensive organisations offer examples of relatively small but clean multilingual expertise data, covering broad ranges of expertise areas. We first present two main expertise retrieval tasks, along with a set of baseline approaches based on generative language modeling, aimed at finding expertise relations between topics and people. For our experimental evaluation, we introduce (and release) a new test set based on a crawl of a university site. Using this test set, we conduct two series of experiments. The first is aimed at determining the effectiveness of baseline expertise retrieval methods applied to the new test set. The second is aimed at assessing refined models that exploit characteristic features of the new test set, such as the organizational structure of the university, and the hierarchical structure of the topics in the test set. Expertise retrieval models are shown to be robust with respect to environments smaller than the W3C collection, and current techniques appear to be generalizable to other settings.

Explore More