Avi Arampatzis
Democritus University of Thrace
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Avi Arampatzis.
International Journal of Geographical Information Science | 2007
Ross S. Purves; Paul D. Clough; Christopher B. Jones; Avi Arampatzis; Bénédicte Bucher; David James Finch; Gaihua Fu; Hideo Joho; Awase Khirni Syed; Subodh Vaid; Bisheng Yang
Much of the information stored on the web contains geographical context, but current search engines treat such context in the same way as all other content. In this paper we describe the design, implementation and evaluation of a spatially aware search engine which is capable of handling queries in the form of the triplet of ⟨theme⟩⟨spatial relationship⟩⟨location⟩. The process of identifying geographic references in documents and assigning appropriate footprints to documents, to be stored together with document terms in an appropriate indexing structure allowing real‐time search, is described. Methods allowing users to query and explore results which have been relevance‐ranked in terms of both thematic and spatial relevance have been implanted and a usability study indicates that users are happy with the range of spatial relationships available and intuitively understand how to use such a search engine. Normalised precision for 38 queries, containing four types of spatial relationships, is significantly higher (p<0.001) for searches exploiting spatial information than pure text search.
geographic information retrieval | 2006
Avi Arampatzis; Marc J. van Kreveld; Iris Reinbacher; Christopher B. Jones; Subodh Vaid; Paul D. Clough; Hideo Joho; Mark Sanderson
This paper describes several steps in the derivation of boundaries of imprecise regions using the Web as the information source. We discuss how to use the Web to obtain locations that are part of and locations that are not part of the region to be delineated, and then we propose methods to compute the region algorithmically. The methods introduced are evaluated to judge the potential of the approach.
international acm sigir conference on research and development in information retrieval | 2001
Avi Arampatzis; André van Hameran
The thresholding of document scores has proved critical for the effectiveness of classification tasks. We review the most important approaches to thresholding, and introduce thescore-distributional (S-D) threshold optimizationmethod. The method is based on score distributions and is capable of optimizing any effectiveness measure defined in terms of the traditional contingency table. As a byproduct, we provide a model forscore distributions, and demonstrate its high accuracy in describing empirical data. The estimation method can be performed incrementally, a highly desirable feature for adaptive environments. Our work in modeling score distributions is useful beyond threshold optimization problems. It directly applies to other retrieval environments that make use of score distributions,e.g., distributed retrieval, or topic detection and tracking. The most accurate version of S-D thresholding --- although incremental --- can be computationally heavy. Therefore, we also investigate more practical solutions. We suggest practical approximations and discuss adaptivity, threshold initialization, and incrementality issues. The practical version of S-D thresholding has been tested in the context of the TREC-9 Filtering Track and found to be very effective [2].
international acm sigir conference on research and development in information retrieval | 2009
Avi Arampatzis; Jaap Kamps; Stephen E. Robertson
Ranked retrieval has a particular disadvantage in comparison with traditional Boolean retrieval: there is no clear cut-off point where to stop consulting results. This is a serious problem in some setups. We investigate and further develop methods to select the rank cut-off value which optimizes a given effectiveness measure. Assuming no other input than a systems output for a query--document scores and their distribution--the task is essentially a score-distributional threshold optimization problem. The recent trend in modeling score distributions is to use a normal-exponential mixture: normal for relevant, and exponential for non-relevant document scores. We discuss the two main theoretical problems with the current model, support incompatibility and non-convexity, and develop new models that address them. The main contributions of the paper are two truncated normal-exponential models, varying in the way the out-truncated score ranges are handled. We conduct a range of experiments using the TREC 2007 and 2008 Legal Track data, and show that the truncated models lead to significantly better results.
Information Processing and Management | 1998
Avi Arampatzis; T. Tsoris; Cornelis H. A. Koster
Abstract In this article we describe a retrieval schema which goes beyond the classical information retrieval keyword hypothesis and takes into account also linguistic variation. Guided by the failures and successes of other state-of-the-art approaches, as well as our own experience with the Irena system, our approach is based on phrases and incorporates linguistic resources and processors. In this respect, we introduce the phrase retrieval hypothesis to replace the keyword retrieval hypothesis. We suggest a representation of phrases suitable for indexing, and an architecture for such a retrieval system. Syntactical normalization is introduced to improve retrieval effectiveness. Morphological and lexico-semantical normalizations are adjusted to fit in this model.
conference on information and knowledge management | 2009
Avi Arampatzis; Jaap Kamps
Score normalization is indispensable in distributed retrieval and fusion or meta-search where merging of result-lists is required. Distributional approaches to score normalization with reference to relevance, such as binary mixture models like the normal-exponential, suffer from lack of universality and troublesome parameter estimation especially under sparse relevance. We develop a new approach which tackles both problems by using aggregate score distributions without reference to relevance, and is suitable for uncooperative engines. The method is based on the assumption that scores produced by engines consist of a signal and a noise component which can both be approximated by submitting well-defined sets of artificial queries to each engine. We evaluate in a standard distributed retrieval testbed and show that the signal-to-noise approach yields better results than other distributional methods. As a significant by-product, we investigate query-length distributions.
Geoinformatica | 2005
Marc J. van Kreveld; Iris Reinbacher; Avi Arampatzis; Roelof van Zwol
Geographic Information Retrieval is concerned with retrieving documents in response to a spatially related query. This paper addresses the ranking of documents by both textual and spatial relevance. To this end, we introduce multi-dimensional scattered ranking, where textually and spatially similar documents are ranked spread in the list, instead of consecutively. The effect of this is that documents close together in the ranked list have less redundant information. We present various ranking methods of this type, efficient algorithms to implement them, and experiments to show the outcome of the methods.
Information Retrieval | 2011
Avi Arampatzis; Stephen E. Robertson
We review the history of modeling score distributions, focusing on the mixture of normal-exponential by investigating the theoretical as well as the empirical evidence supporting its use. We discuss previously suggested conditions which valid binary mixture models should satisfy, such as the Recall-Fallout Convexity Hypothesis, and formulate two new hypotheses considering the component distributions, individually as well as in pairs, under some limiting conditions of parameter values. From all the mixtures suggested in the past, the current theoretical argument points to the two gamma as the most-likely universal model, with the normal-exponential being a usable approximation. Beyond the theoretical contribution, we provide new experimental evidence showing vector space or geometric models, and BM25, as being ‘friendly’ to the normal-exponential, and that the non-convexity problem that the mixture possesses is practically not severe. Furthermore, we review recent non-binary mixture models, speculate on graded relevance, and consider methods such as logistic regression for score calibration.
Information Retrieval | 2013
Avi Arampatzis; Pavlos S. Efraimidis; George Drosatos
We propose a method for search privacy on the Internet, focusing on enhancing plausible deniability against search engine query-logs. The method approximates the target search results, without submitting the intended query and avoiding other exposing queries, by employing sets of queries representing more general concepts. We model the problem theoretically, and investigate the practical feasibility and effectiveness of the proposed solution with a set of real queries with privacy issues on a large web collection. The findings may have implications for other IR research areas, such as query expansion and fusion in meta-search. Finally, we discuss ideas for privacy, such as k-anonymity, and how these may be applied to search tasks.
similarity search and applications | 2010
Konstantinos Zagoris; Avi Arampatzis; Savvas A. Chatzichristofis
We introduce an experimental search engine for multilingual and multimedia information, employing a holistic web interface and enabling the use of highly distributed indices. Modalities are searched in parallel, and results can be fused via several selectable methods. The engine also provides multistage retrieval, as well as a single text index baseline for comparison purposes. Initial impressions on its effectiveness are positive, while its efficiency may easily be improved.