Paavo Arvola | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Paavo Arvola is active.

Explore More

Publication

Featured researches published by Paavo Arvola.

conference on information and knowledge management | 2005

Generalized contextualization method for XML information retrieval

Paavo Arvola; Marko Junkkari; Jaana Kekäläinen

A general re-weighting method, called contextualization, for more efficient element ranking in XML retrieval is introduced. Re-weighting is based on the idea of using the ancestors of an element as a context: if the element appears in a good context -- good interpreted as probability of relevance -- its weight is increased in relevance scoring; if the element appears in a bad context, its weight is decreased. The formal presentation of contextualization is given in a general XML representation and manipulation frame, which is based on utilization of structural indices. This provides a general approach independent of weighting schemas or query languages.Contextualization is evaluated with the INEX test collection. We tested four runs: no contextualization, parent, root and tower contextualizations. The contextualization runs were significantly better than no contextualization. The root contextualization was the best among the re-weighted runs.

Information Retrieval | 2010

Expected reading effort in focused retrieval evaluation

Paavo Arvola; Jaana Kekäläinen; Marko Junkkari

This study introduces a novel framework for evaluating passage and XML retrieval. The framework focuses on a user’s effort to localize relevant content in a result document. Measuring the effort is based on a system guided reading order of documents. The effort is calculated as the quantity of text the user is expected to browse through. More specifically, this study seeks evaluation metrics for retrieval methods following a specific fetch and browse approach, where in the fetch phase documents are ranked in decreasing order according to their document score, like in document retrieval. In the browse phase, for each retrieved document, a set of non-overlapping passages representing the relevant text within the document is retrieved. In other words, the passages of the document are re-organized, so that the best matching passages are read first in sequential order. We introduce an application scenario motivating the framework, and propose sample metrics based on the framework. These metrics give a basis for the comparison of effectiveness between traditional document retrieval and passage/XML retrieval and illuminate the benefit of passage/XML retrieval.

Information Processing and Management | 2011

Contextualization models for XML retrieval

Paavo Arvola; Jaana Kekäläinen; Marko Junkkari

In a hierarchical XML structure, surrounding elements form the context of an XML element. In document-oriented XML, the context is a part of the semantics of the element and augments its textual information. The process of taking the context of the element into account in element scoring is called contextualization. This study extends the concept of contextualization and presents a classification of contextualization models. In an XML collection, elements are of different granularity, i.e. lower level elements are shorter and carry less textual information. Thus, it seems credible that contextualization interacts differently with diverse elements. Even if it is known that contextualization leads to improved effectiveness in element retrieval, the improvement on different granularity levels has not been investigated. This study explores the effect of contextualization on these levels. Further, a parameterized framework for testing contextualization is presented. The empirical part of the study is carried out in a traditional laboratory setting, where an XML collection is granulated. This is necessary in order to measure performance separately at different hierarchy levels. The results confirm the effectiveness of contextualization, and show how the elements of different granularities benefit from contextualization.

INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval | 2004

TRIX 2004: struggling with the overlap

Jaana Kekäläinen; Marko Junkkari; Paavo Arvola; Timo Aalto

In this paper, we present a new XML retrieval system prototype employing structural indices and a tf * idf weighting modification. We test retrieval methods that a) emphasize the tf part in weighting and b) allow overlap in run results to different degrees. It seems that increasing the overlap percentage leads to a better performance. Emphasizing the tf part enables us to increase exhaustiveness of the returned results.

ACM Transactions on Information Systems | 2015

Task-Based Information Interaction Evaluation: The Viewpoint of Program Theory

Kalervo Järvelin; Pertti Vakkari; Paavo Arvola; Feza Baskaya; Anni Järvelin; Jaana Kekäläinen; Heikki Keskustalo; Sanna Kumpulainen; Miamaria Saastamoinen; Reijo Savolainen; Eero Sormunen

Evaluation is central in research and development of information retrieval (IR). In addition to designing and implementing new retrieval mechanisms, one must also show through rigorous evaluation that they are effective. A major focus in IR is IR mechanisms’ capability of ranking relevant documents optimally for the users, given a query. Searching for information in practice involves searchers, however, and is highly interactive. When human searchers have been incorporated in evaluation studies, the results have often suggested that better ranking does not necessarily lead to better search task, or work task, performance. Therefore, it is not clear which system or interface features should be developed to improve the effectiveness of human task performance. In the present article, we focus on the evaluation of task-based information interaction (TBII). We give special emphasis to learning tasks to discuss TBII in more concrete terms. Information interaction is here understood as behavioral and cognitive activities related to task planning, searching information items, selecting between them, working with them, and synthesizing and reporting. These five generic activities contribute to task performance and outcome and can be supported by information systems. In an attempt toward task-based evaluation, we introduce program theory as the evaluation framework. Such evaluation can investigate whether a program consisting of TBII activities and tools works and how it works and, further, provides a causal description of program (in)effectiveness. Our goal in the present article is to structure TBII on the basis of the five generic activities and consider the evaluation of each activity using the program theory framework. Finally, we combine these activity-based program theories in an overall evaluation framework for TBII. Such an evaluation is complex due to the large number of factors affecting information interaction. Instead of presenting tested program theories, we illustrate how the evaluation of TBII should be accomplished using the program theory framework in the evaluation of systems and behaviors, and their interactions, comprehensively in context.

international acm sigir conference on research and development in information retrieval | 2010

Focused access to sparsely and densely relevant documents

Paavo Arvola; Jaana Kekäläinen; Marko Junkkari

XML retrieval provides a focused access to the relevant content of documents. However, in evaluation, full document retrieval has appeared competitive to focused XML retrieval. We analyze the density of relevance in documents, and show that in sparsely relevant documents focused retrieval performs better, whereas in densely relevant documents the performance of focused and document retrieval is equal.

conference on information and knowledge management | 2012

Contextualization using hyperlinks and internal hierarchical structure of Wikipedia documents

Muhammad Ali Norozi; Paavo Arvola; Arjen P. de Vries

Context surrounding hyperlinked semi-structured documents, externally in the form of citations and internally in the form of hierarchical structure, contains a wealth of useful but implicit evidence about a documents relevance. These rich sources of information should be exploited as contextual evidence. This paper proposes various methods of accumulating evidence from the context, and measures the effect of contextual evidence on retrieval effectiveness for document and focused retrieval of hyperlinked semi-structured documents. We propose a re-weighting model to contextualize (a) evidence from citations in a query-independent and query-dependent fashion (based on Markovian random walks) and (b) evidence accumulated from the internal tree structure of documents. The in-links and out-links of a node in the citation graph are used as external context, while the internal document structure provides internal, within-document context. We hypothesize that documents in a good context (having strong contextual evidence) should be good candidates to be relevant to the posed query, and vice versa. We tested several variants of contextualization and verified notable improvements in comparison with the baseline system and gold standards in the retrieval of full documents and focused elements.

Focused Access to XML Documents | 2008

Entity Ranking Based on Category Expansion

Janne Jämsen; Turkka Näppilä; Paavo Arvola

This paper introduces category and link expansion strategies for the XML Entity Ranking track at INEX 2007. Category expansion is a coefficient propagation method for the Wikipedia category hierarchy based on given categories or categories derived from sample entities. Link expansion utilizes links between Wikipedia articles. The strategies are evaluated within the entity ranking and list completion tasks.

information retrieval facility conference | 2012

Generating variant keyword forms for a morphologically complex language leads to successful information retrieval with finnish

Kimmo Kettunen; Paavo Arvola

This paper discusses information retrieval of Finnish and keyword variation management by generating inflected variant keyword forms. Finnish is a highly inflectional language, and thus keyword variation management of queries and query indexes is of utter importance for successful Finnish full-text retrieval. In the paper we show that generation of a quite small number of variant keyword forms leads to good retrieval performance using a probabilistic best-match retrieval system (Lemur). Generation of almost the full paradigm of inflected nominal forms improves the results slightly. We have also interesting results with regards to different index types: our evaluation shows that generated inflected queries behave extremely well in a lemmatized index, which is supposedly not suitable for this query type. We also show that in a research environment even inexact generation that produces lots of incorrect inflected forms achieves high precision-recall performance without considerable loss in query throughput effectiveness. We use two different word form generators and their variants and compare the results to commonly used reductive word form variation management methods, stemming and lemmatization. The paper includes also a short discussion about usage of the variant keyword method with Web search engines.

conference on information and knowledge management | 2008

The effect of contextualization at different granularity levels in content-oriented xml retrieval

Paavo Arvola; Jaana Kekäläinen; Marko Junkkari

In the hierarchical XML structure, the ancestors form the context of an XML element. The process of taking elements context into account in element scoring is called contextualization. The aim of this paper is to separate different granularity levels and test the effect of contextualization on these levels.

Explore More