Marc Bron
University of Amsterdam
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marc Bron.
international semantic web conference | 2009
Edgar Meij; Marc Bron; Laura Hollink; Bouke Huurnink; Maarten de Rijke
An important application of semantic web technology is recognizing human-defined concepts in text. Query transformation is a strategy often used in search engines to derive queries that are able to return more useful search results than the original query and most popular search engines provide facilities that let users complete, specify, or reformulate their queries. We study the problem of semantic query suggestion , a special type of query transformation based on identifying semantic concepts contained in user queries. We use a feature-based approach in conjunction with supervised machine learning, augmenting term-based features with search history-based and concept-specific features. We apply our method to the task of linking queries from real-world query logs (the transaction logs of the Netherlands Institute for Sound and Vision) to the DBpedia knowledge base. We evaluate the utility of different machine learning algorithms, features, and feature types in identifying semantic concepts using a manually developed test bed and show significant improvements over an already high baseline. The resources developed for this paper, i.e., queries, human assessments, and extracted features, are available for download.
conference on information and knowledge management | 2010
Marc Bron; Krisztian Balog; Maarten de Rijke
Related entity finding is the task of returning a ranked list of homepages of relevant entities of a specified type that need to engage in a given relationship with a given source entity. We propose a framework for addressing this task and perform a detailed analysis of four core components; co-occurrence models, type filtering, context modeling and homepage finding. Our initial focus is on recall. We analyze the performance of a model that only uses co-occurrence statistics. While this method identifies the potential set of related entities, it fails to rank them effectively. Two types of error emerge: (1) entities of the wrong type pollute the ranking and (2) while somehow associated to the source entity, some retrieved entities do not engage in the right relation with it. To address (1), we add type filtering based on category information available in Wikipedia. To correct for (2), we complement our related entity finding method with contextual information, represented as language models derived from documents in which source and target entities co-occur. To complete the pipeline, we find homepages of top ranked entities by combining a language modeling approach with heuristics based on Wikipedias external links. Our method achieves very high recall scores on the end-to-end task, providing a solid starting point for expanding our focus to improve precision. Our framework can effectively incorporate additional heuristics and these extensions lead to state-of-the-art performance.
european conference on information retrieval | 2010
Krisztian Balog; Marc Bron; Maarten de Rijke
Users often search for entities instead of documents and in this setting are willing to provide extra input, in addition to a query, such as category information and example entities. We propose a general probabilistic framework for entity search to evaluate and provide insight in the many ways of using these types of input for query modeling. We focus on the use of category information and show the advantage of a category-based representation over a term-based representation, and also demonstrate the effectiveness of category-based expansion using example entities. Our best performing model shows very competitive performance on the INEX-XER entity ranking and list completion tasks.
theory and practice of digital libraries | 2011
Marc Bron; Bouke Huurnink; Maarten de Rijke
News, multimedia and cultural heritage archives are increasingly offering opportunities to create connections between their collections. We consider the task of linking archives: connecting an item in one archive to one or more items in other, often complementary archives. We focus on a specific instance of the task: linking items with a rich textual representation in a news archive to items with sparse annotations in a multimedia archive, where items should be linked if they describe the same or a related event. We find that the difference in textual richness of annotations presents a challenge and investigate two approaches: (i) to enrich sparsely annotated items with textually rich content; and (ii) to reduce rich news archive items using term selection. We demonstrate the positive impact of both approaches on linking to same events and linking to related events.
european conference on information retrieval | 2013
Marc Bron; Krisztian Balog; Maarten de Rijke
The scale of todays Web of Data motivates the use of keyword search-based approaches to entity-oriented search tasks in addition to traditional structure-based approaches, which require users to have knowledge of the underlying schema. We propose an alternative structure-based approach that makes use of example entities and compare its effectiveness with a text-based approach in the context of an entity list completion task. We find that both the text and structure-based approaches are effective in retrieving relevant entities, but that they find different sets of entities. Additionally, we find that the performance of the structure-based approach is dependent on the quality and number of example entities given. We experiment with a number of hybrid techniques that balance between the two approaches and find that a method that uses the example entities to determine the weights of approaches in the combination on a per query basis is most effective.
international acm sigir conference on research and development in information retrieval | 2013
Marc Bron; Jasmijn van Gorp; Frank Nack; Lotte Belice Baltussen; Maarten de Rijke
Aggregated search interfaces provide users with an overview of results from various sources. Two general types of display exist: tabbed, with access to each source in a separate tab, and blended, which combines multiple sources into a single result page. Multi-session search tasks, e.g., a research project, consist of multiple stages, each with its own sub-tasks. Several factors involved in multi-session search tasks have been found to influence user search behavior. We investigate whether user preference for source presentation changes during a multi-session search task. The dynamic nature of multi-session search tasks makes the design of a controlled experiment a non-trivial challenge. We adopt a methodology based on triangulation and conduct two types of observational study: a longitudinal study and a laboratory study. In the longitudinal study we follow the use of tabbed and blended displays by 25 students during a project. We find that while a tabbed display is used more than a blended display, subjects repeatedly switch between displays during the project. Use of the tabbed display is motivated by a need to zoom in on a specific source, while the blended display is used to explore available material across sources whenever the information need changes. In a laboratory study 44 students completed a multi-session search task composed of three sub-tasks, the first with a tabbed display, the second and third with blended displays. The tasks were manipulated by either providing three task about the same topic or about three different topics. We find that a stable information need over multiple sub-tasks negatively influences perceived usability of the blended displays, while we do not find an influence when the information need changes.
international acm sigir conference on research and development in information retrieval | 2010
Katja Hofmann; Bouke Huurnink; Marc Bron; Maarten de Rijke
Traditional retrieval evaluation uses explicit relevance judgments which are expensive to collect. Relevance assessments inferred from implicit feedback such as click-through data can be collected inexpensively, but may be less reliable. We compare assessments derived from click-through data to another source of implicit feedback that we assume to be highly indicative of relevance: purchase decisions. Evaluating retrieval runs based on a log of an audio-visual archive, we find agreement between system rankings and purchase decisions to be surprisingly high.
cross language evaluation forum | 2010
Bouke Huurnink; Katja Hofmann; Maarten de Rijke; Marc Bron
We design and validate simulators for generating queries and relevance judgments for retrieval system evaluation. We develop a simulation framework that incorporates existing and new simulation strategies. To validate a simulator, we assess whether evaluation using its output data ranks retrieval systems in the same way as evaluation using real-world data. The real-world data is obtained using logged commercial searches and associated purchase decisions. While no simulator reproduces an ideal ranking, there is a large variation in simulator performance that allows us to distinguish those that are better suited to creating artificial testbeds for retrieval experiments. Incorporating knowledge about document structure in the query generation process helps create more realistic simulators.
Proceedings of the First International Workshop on Gamification for Information Retrieval | 2014
Jiyin He; Marc Bron; Leif Azzopardi; Arjen P. de Vries
Typical crowdsourcing tasks ask workers to label images or make relevance judgements, as a low cost alternative to lab based user studies. More recently, gamification has been employed as a way to make these tasks more appealing and so users play, rather than work. One observation is that differences in task design and incentives elicits different player behavior. In this paper we discuss a new type of task, where we aim at eliciting player behavior that resembles user behavior when performing a search task. Care should be taken in the design of a gamified version of such a task to allow players to complete tasks with a limited amount of effort and time, without changing the behavior to be studied. We discuss the motivation of the abstractions and design choices we have made in achieving this goal. We then analyze whether and how these abstractions and design choices influence our observations of player behaviors.
Lecture Notes in Computer Science | 2009
Krisztian Balog; Marc Bron; Maarten de Rijke; Wouter Weerkamp
We describe our participation in the INEX 2009 Entity Ranking track. We employ a probabilistic retrieval model for entity search in which term-based and category-based representations of queries and entities are effectively integrated. We demonstrate that our approach achieves state-of-the-art performance on both the entity ranking and list completion tasks.