Börkur Sigurbjörnsson

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Börkur Sigurbjörnsson is active.

Explore More

Publication

Featured researches published by Börkur Sigurbjörnsson.

international world wide web conferences | 2008

Flickr tag recommendation based on collective knowledge

Börkur Sigurbjörnsson; Roelof van Zwol

Online photo services such as Flickr and Zooomr allow users to share their photos with family, friends, and the online community at large. An important facet of these services is that users manually annotate their photos using so called tags, which describe the contents of the photo or provide additional contextual and semantical information. In this paper we investigate how we can assist users in the tagging phase. The contribution of our research is twofold. We analyse a representative snapshot of Flickr and present the results by means of a tag characterisation focussing on how users tags photos and what information is contained in the tagging. Based on this analysis, we present and evaluate tag recommendation strategies to support the user in the photo annotation task by recommending a set of tags that can be added to the photo. The results of the empirical evaluation show that we can effectively recommend relevant tags for a variety of photos with different levels of exhaustiveness of original tagging.

web search and data mining | 2009

Classifying tags using open content resources

Simon E. Overell; Börkur Sigurbjörnsson; Roelof van Zwol

Tagging has emerged as a popular means to annotate on-line objects such as bookmarks, photos and videos. Tags vary in semantic meaning and can describe different aspects of a media object. Tags describe the content of the media as well as locations, dates, people and other associated meta-data. Being able to automatically classify tags into semantic categories allows us to understand better the way users annotate media objects and to build tools for viewing and browsing the media objects. In this paper we present a generic method for classifying tags using third party open content resources, such as Wikipedia and the Open Directory. Our method uses structural patterns that can be extracted from resource meta-data. We describe the implementation of our method on Wikipedia using WordNet categories as our classification schema and ground truth. Two structural patterns found in Wikipedia are used for training and classification: categories and templates. We apply our system to classifying Flickr tags. Compared to a WordNet baseline our method increases the coverage of the Flickr vocabulary by 115%. We can classify many important entities that are not covered by WordNet, such as, London Eye, Big Island, Ronaldinho, geocaching and wii.

international acm sigir conference on research and development in information retrieval | 2004

Length normalization in XML retrieval

Jaap Kamps; Maarten de Rijke; Börkur Sigurbjörnsson

XML retrieval is a departure from standard document retrieval in which each individual XML element, ranging from italicized words or phrases to full blown articles, is a potentially retrievable unit. The distribution of XML element lengths is unlike what we usually observe in standard document collections, prompting us to revisit the issue of document length normalization. We perform a comparative analysis of arbitrary elements versus relevant elements, and show the importance of length as a parameter for XML retrieval. Within the language modeling framework, we investigate a range of techniques that deal with length either directly or indirectly. We observe a length bias introduced by the amount of smoothing, and show the importance of extreme length priors for XML retrieval. We also show that simply removing shorter elements from the index (by introducing a cut-off value) does not create an appropriate document length normalization. Even after increasing the minimal size of XML elements occurring in the index, the importance of an extreme length bias remains.

INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval | 2004

Narrowed extended XPath i (NEXI)

Andrew Trotman; Börkur Sigurbjörnsson

INEX has through the years provided two types of queries: Content-Only queries (CO) and Content-And-Structure queries (CAS). The CO language has not changed much, but the CAS language has been more problematic. For the CAS queries, the INEX 02 query language proved insufficient for specifying problems for INEX 03. This was addressed by using an extended version of XPath, which, in turn, proved too complex to use correctly. Recently, an INEX working group identified the minimal set of requirements for a suitable query language for future workshops. From this analysis a new IR query language NEXI is introduced for upcoming workshops.

international world wide web conferences | 2010

Faceted exploration of image search results

Roelof van Zwol; Börkur Sigurbjörnsson; Ramu Adapala; Lluis Garcia Pueyo; Abhinav Katiyar; Kaushal Kurapati; Mridul Muralidharan; Sudar Muthu; Vanessa Murdock; Polly Ng; Anand Ramani; Anuj Sahai; Sriram Thiru Sathish; Hari Vasudev; Upendra Vuyyuru

This paper describes MediaFaces, a system that enables faceted exploration of media collections. The system processes semi-structured information sources to extract objects and facets, e.g. the relationships between two objects. Next, we rank the facets based on a statistical analysis of image search query logs, and the tagging behaviour of users annotating photos in Flickr. For a given object of interest, we can then retrieve the top-k most relevant facets and present them to the user. The system is currently deployed in production by Yahoo!s image search engine1. We present the system architecture, its main components, and the application of the system as part of the image search experience.

international acm sigir conference on research and development in information retrieval | 2003

XML retrieval: what to retrieve?

Jaap Kamps; Maarten Marx; Maarten de Rijke; Börkur Sigurbjörnsson

The fundamental difference between standard information retrieval and XML retrieval is the unit of retrieval. In traditional IR, the unit of retrieval is fixed: it is the complete document. In XML retrieval, every XML element in a document is a retrievable unit. This makes XML retrieval more difficult: besides being relevant, a retrieved unit should be neither too large nor too small. The research presented here, a comparative analysis of two approaches to XML retrieval, aims to shed light on which XML elements should be retrieved. The experimental evaluation uses data from the Initiative for the Evaluation of XML retrieval (INEX 2002).

Information Retrieval | 2005

The Importance of Length Normalization for XML Retrieval

Jaap Kamps; Maarten de Rijke; Börkur Sigurbjörnsson

XML retrieval is a departure from standard document retrieval in which each individual XML element, ranging from italicized words or phrases to full blown articles, is a retrievable unit. The distribution of XML element lengths is unlike what we usually observe in standard document collections, prompting us to revisit the issue of document length normalization. We perform a comparative analysis of arbitrary elements versus relevant elements, and show the importance of element length as a parameter for XML retrieval. Within the language modeling framework, we investigate a range of techniques that deal with length either directly or indirectly. We observe a length-bias introduced by the amount of smoothing, and show the importance of extreme length bias for XML retrieval. We also show that simply removing shorter elements from the index (by introducing a cut-off value) does not create an appropriate element length normalization. Even after restricting the minimal size of XML elements occurring in the index, the importance of an extreme explicit length bias remains.

cross language evaluation forum | 2005

EuroGOV: engineering a multilingual web corpus

Börkur Sigurbjörnsson; Jaap Kamps; Maarten de Rijke

EuroGOV is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. EuroGOV is a collection of web pages crawled from the European Union portal, European Union member state governmental web sites, and Russian governmental web sites. The corpus contains over 3 million documents written in more than 20 different European languages. In this paper we provide a detailed description of the EuroGOV collection.

conference on information and knowledge management | 2004

Processing content-oriented XPath queries

Börkur Sigurbjörnsson; Jaap Kamps; Maarten de Rijke

Document-centric XML collections contain text-rich documents, marked up with XML tags that add lightweight semantics to the text. Querying such collections calls for a hybrid query language: the text-rich nature of the documents suggests a content-oriented (IR) approach, while the mark-up allows users to add structural constraints to their IR queries. Hybrid queries tend to be more expressive, which should lead---in principle---to better retrieval performance. In practice, the processing of these hybrid queries within an IR systems turns out to be far from trivial, because a delicate balance between structural and content information needs to be sought. We propose an approach to processing such hybrid content-and-structure queries that decomposes a query into multiple content-only queries whose results are then combined in ways determined by the structural constraints of the original query. We evaluate our methods using the INEX 2003 test-suite, and show (1) that effective ways of processing of content-oriented XPath queries are non-trivial, (2) that there are differences in the effectiveness for different topics types, but (3) that with appropriate processing methods retrieval effectiveness can improve.

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval | 2005

The effect of structured queries and selective indexing on XML retrieval

Börkur Sigurbjörnsson; Jaap Kamps

We describe the University of Amsterdam’s participation in the INEX 2005 ad hoc track, covering the Thorough, Focused, and FetchBrowse tasks and their structured (+S) counterparts. Our research questions for this round of INEX were threefold. Our first and main research question was to investigate the contribution of structural constraints to improved retrieval performance. Our main results were that the two types of structural constraints have different effects. Constraining the target of result elements gives improvements in terms of early precision. Constraining the context of result elements improves mean average precision. Our second research question was to experiment with selective indexing strategies based on either the length of elements, the tag-name of elements considered relevant in earlier INEX years, or simply by indexing all sections or articles. Our experiments show that disregarding 80–90% of the total number of elements does not decrease retrieval performance. Third, we considered the automatic creation of structured queries using blind feedback. Here, our results are inconclusive, mainly due to few queries used and lack of comparison to traditional blind feedback.

Explore More