Anni Järvelin
University of Tampere
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Anni Järvelin.
ACM Transactions on Information Systems | 2015
Kalervo Järvelin; Pertti Vakkari; Paavo Arvola; Feza Baskaya; Anni Järvelin; Jaana Kekäläinen; Heikki Keskustalo; Sanna Kumpulainen; Miamaria Saastamoinen; Reijo Savolainen; Eero Sormunen
Evaluation is central in research and development of information retrieval (IR). In addition to designing and implementing new retrieval mechanisms, one must also show through rigorous evaluation that they are effective. A major focus in IR is IR mechanisms’ capability of ranking relevant documents optimally for the users, given a query. Searching for information in practice involves searchers, however, and is highly interactive. When human searchers have been incorporated in evaluation studies, the results have often suggested that better ranking does not necessarily lead to better search task, or work task, performance. Therefore, it is not clear which system or interface features should be developed to improve the effectiveness of human task performance. In the present article, we focus on the evaluation of task-based information interaction (TBII). We give special emphasis to learning tasks to discuss TBII in more concrete terms. Information interaction is here understood as behavioral and cognitive activities related to task planning, searching information items, selecting between them, working with them, and synthesizing and reporting. These five generic activities contribute to task performance and outcome and can be supported by information systems. In an attempt toward task-based evaluation, we introduce program theory as the evaluation framework. Such evaluation can investigate whether a program consisting of TBII activities and tools works and how it works and, further, provides a causal description of program (in)effectiveness. Our goal in the present article is to structure TBII on the basis of the five generic activities and consider the evaluation of each activity using the program theory framework. Finally, we combine these activity-based program theories in an overall evaluation framework for TBII. Such an evaluation is complex due to the large number of factors affecting information interaction. Instead of presenting tested program theories, we illustrate how the evaluation of TBII should be accomplished using the program theory framework in the evaluation of systems and behaviors, and their interactions, comprehensively in context.
international acm sigir conference on research and development in information retrieval | 2012
Richard Berendsen; Allan Hanbury; Mihai Lupu; Vivien Petras; Maarten de Rijke; Gianmaria Silvello; Maristella Agosti; Toine Bogers; Martin Braschler; Paul Buitelaar; Khalid Choukri; Giorgio Maria Di Nunzio; Nicola Ferro; Pamela Forner; Karin Friberg Heppin; Preben Hansen; Anni Järvelin; Birger Larsen; Ivano Masiero; Henning Müller; Florina Piroi; Giuseppe Santucci; Elaine G. Toms
The PROMISE network of excellence organized a two-days brainstorming workshop on 30th and 31st May 2012 in Padua, Italy, to discuss and envisage future directions and perspectives for the evaluation of information access and retrieval systems in multiple languages and multiple media. This document reports on the outcomes of this event and provides details about the six envisaged research lines: search applications; contextual evaluation; challenges in test collection design and exploitation; component-based evaluation; ongoing evaluation; and signal-aware evaluation. The ultimate goal of the PROMISE retreat is to stimulate and involve the research community along these research lines and to provide funding agencies with effective and scientifically sound ideas for coordinating and supporting information access research.
association for information science and technology | 2016
Anni Järvelin; Heikki Keskustalo; Eero Sormunen; Miamaria Saastamoinen; Kimmo Kettunen
The aim of the study was to test whether query expansion by approximate string matching methods is beneficial in retrieval from historical newspaper collections in a language rich with compounds and inflectional forms (Finnish). First, approximate string matching methods were used to generate lists of index words most similar to contemporary query terms in a digitized newspaper collection from the 1800s. Top index word variants were categorized to estimate the appropriate query expansion ranges in the retrieval test. Second, the effectiveness of approximate string matching methods, automatically generated inflectional forms, and their combinations were measured in a Cranfield‐style test. Finally, a detailed topic‐level analysis of test results was conducted. In the index of historical newspaper collection the occurrences of a word typically spread to many linguistic and historical variants along with optical character recognition (OCR) errors. All query expansion methods improved the baseline results. Extensive expansion of around 30 variants for each query word was required to achieve the highest performance improvement. Query expansion based on approximate string matching was superior to using the inflectional forms of the query words, showing that coverage of the different types of variation is more important than precision in handling one type of variation.
string processing and information retrieval | 2008
Anni Järvelin; Antti Järvelin
Classified s -grams have been successfully used in cross-language information retrieval (CLIR) as an approximate string matching technique for translating out-of-vocabulary (OOV) words. For example, s -grams have consistently outperformed other approximate string matching techniques, like edit distance or n -grams. The Jaccard coefficient has traditionally been used as an s -gram based string proximity measure. However, other proximity measures for s -gram matching have not been tested. In the current study the performance of seven proximity measures for classified s -grams in CLIR context was evaluated using eleven language pairs. The binary proximity measures performed generally better than their non-binary counterparts, but the difference depended mainly on the padding used with s -grams. When no padding was used, the binary and non-binary proximity measures were nearly equal, though the performance at large deteriorated.
Proceedings of the 2011 workshop on Data infrastructurEs for supporting information retrieval evaluation | 2011
Jussi Karlgren; Anni Järvelin; Gunnar Eriksson; Preben Hansen
Information access research and development, and information retrieval especially, is based on quantitative and systematic benchmarking. Benchmarking of a computational mechanism is always based on some set of assumptions on how a system with the mechanism under consideration will provide value for its users in concrete situations and those assumptions need to be validated somehow. The valuable effort put into those validation studies is seldom useful for other research or system development projects. This paper argues that use cases for information access can be written to give explicit pointers towards benchmarking mechanisms and that if use cases and hypotheses about user preferences, goals, expectation and satisfaction are made explicit in the design of research systems, they will can more conveniently be validated or disproven -- which in turn makes the results emanating from research efforts more relevant for industrial partners, more sustainable for future research and more portable across projects and studies.
analytics for noisy unstructured text data | 2008
Antti Järvelin; Tuomas Talvensaari; Anni Järvelin
In cross-language information retrieval (CLIR), novel or non-standard expressions, technical terminology, or rare proper nouns can be seen as noise when they appear in queries or in the target collection. This kind of vocabulary is often out-of-vocabulary (OOV) for dictionaries that are used to translate queries. In historic document retrieval (HDR), OCR errors and historical spelling variants cause similar problems. In this paper, three data driven approaches to these problems are presented. The two first methods, the transformation rule based translation (TRT) method and the classified s-gram method, operate on string level. With them approximate matches of a query word can be recognized from the target document collection and included into the target query. In the third method, the corpus-based approach, parallel or comparable corpora are employed to derive translation knowledge that can be used to translate OOV words. Besides the overview of the methods, three case studies highlighting their practical applications in CLIR are also presented. The methods are shown to be effective in query translation without dictionaries between closely related languages (TRT and s-grams), OOV word translation (s-grams), and boosting dictionary-based CLIR performance by way of OOV word translation (corpus based methods).
Journal of Documentation | 2013
Preben Hansen; Anni Järvelin; Antti Järvelin
Purpose – This study aims to examine manually formulated queries and automatic query generation in an early phase of a patent “prior art” search. Design/methodology/approach – The study was performed partly within a patent domain setting, involving three professional patent examiners, and partly in the context of the CLEF 2009 Intellectual Property (CLEF-IP) track. For the exploratory study of user-based query formulation, three patent examiners performed the same three simulated real-life patent tasks. For the automatic query generation, a simple term-weighting algorithm based on the RATF formula was used. The manually and automatically created queries were compared to analyse what kinds of keywords and from which parts of the patent documents were selected. Findings – For user-formulated queries, it was found that patent documents were read in a specific order of importance and that the time varied. Annotations and collaboration were made while reading and selecting/ranking terms. Ranking terms was expe...
Professional Search in the Modern World | 2014
Preben Hansen; Anni Järvelin; Gunnar Eriksson; Jussi Karlgren
Information access is no longer only a question of retrieving topical text documents in a work-task related context. Information search has become one of the most common uses of the personal computers; a daily task for millions of individual users searching for information motivated by information needs they experience for some reason, momentarily or continuously. Instead of professionally edited text documents, multilingual and multimedia content from a variety of sources of varying quality needs to be accessed. Even the scope of the research efforts in the field must therefore be broadened to better capture the mechanisms for the systems’ impact, take-up and success in the marketplace. Much work has been carried out in this direction: graded relevance, and new evaluation metrics, more varied document collections used in evaluation and different search tasks evaluated. The research in the field is however fragmented. Despite that the need for a common evaluation framework is widely acknowledged, such framework is still not in place. IR system evaluation results are not regularly validated in Interactive IR or field studies; the infrastructure for generalizing Interactive IR results over tasks, users and collections is still missing. This chapter presents a use case-based framework for experimental design in the field of interactive information access. Use cases in general connect system design and evaluation to interaction and user goals, and help identifying test cases for different user groups of a system. We suggest that use cases can provide a useful link even between information access system usage and evaluation mechanisms and thus bring together research from the different related research fields. In this chapter we discuss how use cases can guide the developments of rich models of users, domains, environments, and interaction, and make explicit how the models are connected to benchmarking mechanisms. We give examples of the central features of the different models. The framework is highlighted by examples that sketch out how the framework can be productively used in experimental design and reporting with a minimal threshold for adoption.
Information Processing and Management | 2007
Anni Järvelin; Antti Järvelin; Kalervo Järvelin
cross language evaluation forum | 2008
Anni Järvelin; Peter Wilkins; Tomasz Adamek; Eija Airio; Gareth J. F. Jones; Alan F. Smeaton; Eero Sormunen