Theodora Tsikrika
Queen Mary University of London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Theodora Tsikrika.
conference on information and knowledge management | 2001
Theodora Tsikrika; Mounia Lalmas
Data fusion on the Web refers to the merging, into a unified single list, of the ranked document lists, which are retrieved in response to a user query by more than one Web search engine. It is performed by metasearch engines and their merging algorithms utilise the information present in the ranked lists of retrieved documents provided to them by the underlying search engines, such as the rank positions of the retrieved documents and their retrieval scores. In this paper, merging techniques are introduced that take into account not only the rank positions, but also the title and the summary accompanying the retrieved documents. Furthermore, the data fusion process is viewed as being similar to the combination of belief in uncertain reasoning and is modelled using Dempster-Shafers theory of evidence. Our evaluation experiments indicate that the above merging techniques yield improvements in the effectiveness and that their effectiveness is comparable to that of the approach that merges the ranked lists by downloading and analysing the Web documents.
Information Processing and Management | 2006
Thomas Rölleke; Theodora Tsikrika; Gabriella Kazai
In this paper, we present a well-defined general matrix framework for modelling Information Retrieval (IR). In this framework, collections, documents and queries correspond to matrix spaces. Retrieval aspects, such as content, structure and semantics, are expressed by matrices defined in these spaces and by matrix operations applied on them. The dualities of these spaces are identified through the application of frequency-based operations on the proposed matrices and through the investigation of the meaning of their eigenvectors. This allows term weighting concepts used for content-based retrieval, such as term frequency and inverse document frequency, to translate directly to concepts for structure-based retrieval. In addition, concepts such as pagerank, authorities and hubs, determined by exploiting the structural relationships between linked documents, can be defined with respect to the semantic relationships between terms. Moreover, this mathematical framework can be used to express classical and alternative evaluation measures, involving, for instance, the structure of documents, and to further explain and relate IR models and theory. The high level of reusability and abstraction of the framework leads to a logical layer for IR that makes system design and construction significantly more efficient, and thus, better and increasingly personalised systems can be built at lower costs.
International Workshop of the Initiative for the Evaluation of XML Retrieval | 2006
Theodora Tsikrika; Thijs Westerveld
The multimedia track focuses on using the structure of the document to extract, relate, and combine the relevance of different multimedia fragments. This paper presents a brief overview of the track, it’s collection tasks and goals. We also report the results and the approaches of the participating groups.
IEEE MultiMedia | 2012
Theodora Tsikrika; Jana Kludas; Adrian Popescu
The ImageCLEF Wikipedia image retrieval task aimed to support ad-hoc image retrieval evaluation using large-scale collections of Wikipedia images and their user-generated annotations.
Information Processing and Management | 2004
Theodora Tsikrika; Mounia Lalmas
In the Web context, link-based evidence is most commonly used in conjunction with content-based evidential information in order to improve retrieval effectiveness. This paper examines the impact the various types of link-based evidence and their combination with content-based evidence have on the retrieval effectiveness for the topic relevance Web task. The inference network model is used in our study, as it supports the combination of multiple document representations and the combination of multiple results produced by different retrieval strategies. Our experiments indicate hardly any improvements in the effectiveness, similarly to previous TREC results for the topic relevance task. However. they allow us to gain an insight into the behaviour of different types of link-based evidence that could be exploited in the context of other retrieval tasks.
european conference on information retrieval | 2002
Theodora Tsikrika; Mounia Lalmas
This paper introduces an expressive formal Information Retrieval model developed for the Web. It is based on the Bayesian inference network model and views IR as an evidential reasoning process. It supports the explicit combination of multiple Web document representations under a single framework. Information extracted from the content of Web documents and derived from the analysis of the Web link structure is used as source of evidence in support of the ranking algorithm. This content and link-based evidential information is utilised in the generation of the multiple Web document representations used in the combination.
international acm sigir conference on research and development in information retrieval | 2007
Theodora Tsikrika; Thijs Westerveld
Structured document retrieval allows for the retrieval of document fragments, i.e. XML elements, containing relevant information. The main INEX Ad Hoc track focuses on text-based XML element retrieval. Although text is dominantly present in most XML document collections, other types of media can also be found. Existing research on multimedia information retrieval has shown that it is far from trivial to determine the combined relevance of a document that contains several multimedia objects. The objective of the INEX multimedia track is to exploit the XML structure that provides a logical level at which multimedia objects are connected, to improve the retrieval performance of an XML-driven multimedia information retrieval system.
International Journal on Digital Libraries | 2007
Elham Ashoori; Mounia Lalmas; Theodora Tsikrika
Content-oriented XML retrieval systems support access to XML repositories by retrieving, in response to user queries, XML document components (XML elements) instead of whole documents. The retrieved XML elements should not only contain information relevant to the query, but also provide the right level of granularity. In INEX, the INitiative for the Evaluation of XML retrieval, a relevant element is defined to be at the right level of granularity if it is exhaustive and specific to the query. Specificity was specifically introduced to capture how focused an element is on the query (i.e., discusses no other irrelevant topics). To score XML elements according to how exhaustive and specific they are given a query, the content and logical structure of XML documents have been widely used. One source of evidence that has led to promising results with respect to retrieval effectiveness is element length. This work aims at examining a new source of evidence deriving from the semantic decomposition of XML documents. We consider that XML documents can be semantically decomposed through the application of a topic segmentation algorithm. Using the semantic decomposition and the logical structure of XML documents, we propose a new source of evidence, the number of topic shifts in an element, to reflect its relevance and more particularly its specificity. This paper has three research objectives. Firstly, we investigate the characteristics of XML elements reflected by their number of topic shifts. Secondly, we compare topic shifts to element length, by incorporating each of them as a feature in a retrieval setting and examining their effects in estimating the relevance of XML elements given a query. Finally, we use the number of topic shifts as evidence for capturing specificity to provide a focused access to XML repositories.
conference on multimedia modeling | 2015
Theodora Tsikrika; Katerina Andreadou; Anastasia Moumtzidou; Emmanouil Schinas; Symeon Papadopoulos; Stefanos Vrochidis; Ioannis Kompatsiaris
Enabling effective multimedia information processing, analysis, and access applications in online social multimedia settings requires data representation models that capture a broad range of the characteristics of such environments and ensure interoperability. We propose a flexible model for describing Socially Interconnected MultiMedia-enriched Objects (SIMMO) that integrates in a unified manner the representation of multimedia and social features in online environments. Its specification is based on a set of identified requirements and its expressive power is illustrated using several diverse examples. Finally, a comparison of SIMMO with existing approaches demonstrates its unique features.
international conference on image processing | 2014
Ioannis A. Sarafis; Christos Diou; Theodora Tsikrika; Anastasios Delopoulos
In this paper we propose a novel approach to training noise-resilient concept detectors from clickthrough data collected by image search engines. We take advantage of the query logs to automatically produce concept detector training sets; these suffer though from label noise, i.e., erroneously assigned labels. We explore two alternative approaches for handling noisy training data at the classifier level by training concept detectors with two SVM variants: the Fuzzy SVM and the Power SVM. Experimental results on images collected from a professional image search engine indicate that 1) Fuzzy SVM outperforms both SVM and Power SVM and is the most effective approach towards handling label noise and 2) the performance gain of Fuzzy SVM compared to SVM increases progressively with the noise level in the training sets.