Gianni Amati
Fondazione Ugo Bordoni
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gianni Amati.
ACM Transactions on Information Systems | 2002
Gianni Amati; Cornelis Joost van Rijsbergen
We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive term-weighting models by measuring the divergence of the actual term distribution from that obtained under a random process. Among the random processes we study the binomial distribution and Bose--Einstein statistics. We define two types of term frequency normalization for tuning term weights in the document--query matching process. The first normalization assumes that documents have the same length and measures the information gain with the observed term once it has been accepted as a good descriptor of the observed document. The second normalization is related to the document length and to other statistics. These two normalization methods are applied to the basic models in succession to obtain weighting formulae. Results show that our framework produces different nonparametric models forming baseline alternatives to the standard tf-idf model.
european conference on information retrieval | 2005
Iadh Ounis; Gianni Amati; Vassilis Plachouras; Ben He; Craig Macdonald; Douglas Johnson
Terrier is a modular platform for the rapid development of large-scale Information Retrieval (IR) applications. It can index various document collections, including TREC and Web collections. Terrier also offers a range of document weighting and query expansion models, based on the Divergence From Randomness framework. It has been successfully used for ad-hoc retrieval, cross-language retrieval, Web IR and intranet search, in a centralised or distributed setting.
Information Processing and Management | 1999
Gianni Amati; Fabio Crestani
New methods and new systems are needed to filter or to selectively distribute the increasing volume of electronic information being produced nowadays. An eAective information filtering system is one that provides the exact information that fulfills user’s interests with the minimum eAort by the user to describe it. Such a system will have to be adaptive to the user changing interest. In this paper we describe and evaluate a learning model for information filtering which is an adaptation of the generalized probabilistic model of Information Retrieval. The model is based on the concept of ‘uncertainty sampling’, a technique that allows for relevance feedback both on relevant and nonrelevant documents. The proposed learning model is the core of a prototype information filtering system called ProFile. # 1999 Elsevier Science Ltd. All rights reserved.
The Computer Journal | 2000
Gianni Amati; Iadh Ounis
We study Sowa’s conceptual graphs (CGs) with both existential and universal quantifiers. We explore in detail the existential fragment. We extend and modify Sowa’s original graph derivation system with new rules and prove the soundness and completeness theorem with respect to Sowa’s standard interpretation of CGs into first order logic (FOL). The proof is obtained by reducing the graph derivation to a question-answering problem. The graph derivation can be equivalently obtained by querying a Definite Horn Clauses program by a conjunction of positive atoms. Moreover, the proof provides an algorithm for graph derivation in a pure proof-theoretic fashion, namely by means of a slight enhancement of the standard PROLOG interpreter. The graph derivation can be rebuilt step-by-step and constructively from the resolution-based proof. We provide a notion of CGs in normal form (the table of the conceptual graph) and show that the PROLOG interpreter also gives a projection algorithm between normal CGs. The normal forms are obtained by extending the FOL language by witnesses (new constants) and extending the graph derivation system. By applying iteratively a set of rules the reduction process terminates with the normal form of a conceptual graph. We also show that graph derivation can be reduced to a question-answering problem in propositional datalog for a subclass of simple CGs. The embedding into propositional datalog makes the complexity of the derivation polynomial.
Journal of Logic and Computation | 1996
Gianni Amati; Luigia Carlucci Aiello; Dov M. Gabbay; Fiora Pirri
We present a general proof theoretical methodology for default systems. Given a default theory 〈W, D〉, the default rules D are simply understood as restrictions on the tableaux construction of the logic. Different default approaches have their own way of understanding these restrictions and executing them. For each default approach (such as Reiter, Brewka or Lukaszewicz), the allowable default extensions can be obtained from the default tableau construction. The advantage of our approach, besides being simple and neat, is in its generality: it allows for the development of a default theory for any logic with a tableau formulation, such as intuitionistic logic, linear logic or modal logic.
cross language evaluation forum | 2002
Gianni Amati; Claudio Carpineto; Giovanni Romano
PROSIT (PRObabilistic Sifting of Information Terms) is a novel probabilistic information retrieval system that combines a term-weighting model based on deviation from randomness with information-theoretic query expansion. We report on the application of PROSIT to the Italian monolingual task at CLEF. We experimented with both standard PROSIT and with enhanced versions. In particular, we studied the use of bigrams and coordination level-based retrieval within the PROSIT framework. The main findings of our research are that (i) standard PROSIT was quite effective, with an average precision of 0.5116 on CLEF 2001 queries and 0.5019 on CLEF 2002 queries, (ii) bigrams were useful provided that they were incorporated into the main algorithm, and (iii) the benefits of coordination level-based retrieval were unclear.
cross language evaluation forum | 2003
Gianni Amati; Claudio Carpineto; Giovanni Romano
Motivated by the hypothesis that the retrieval performance of a weighting model is independent of the language in which queries and collection are expressed, we compared the retrieval performance of three weighting models, i.e., Okapi, statistical language modeling (SLM), and deviation from randomness (DFR), on three monolingual test collections, i.e., French, Italian, and Spanish. The DFR model was found to consistently achieve better results than both Okapi and SLM, whose performance was comparable. We also evaluated whether the use of retrieval feedback improved retrieval performance; retrieval feedback was beneficial for DFR and Okapi and detrimental for SLM. Besides relative performance, DFR with retrieval feedback achieved excellent absolute results: best run for Italian and Spanish, third run for French.
european conference on information retrieval | 2002
Gianni Amati; C. J. van Rijsbergen
We exploit the Feller-Pareto characterization of the classical Pareto distribution to derive a law relating the probability of a given term frequency in a document and its the length. A similar law was derived by Mandelbrot. We exploit the paretian distribution to obtain a term frequency normalization to substitute for the actual term frequency in the probabilistic models of Information Retrieval recently introduced in TREC-10. Preliminary results show that the unique parameter of the framework can be eliminated in favour of the the term frequency normalization derived by the Paretian law.
conference on information and knowledge management | 2012
Gianni Amati; Giuseppe Amodeo; Carlo Gaibisso
Freshness of information in real-time search is central in social networks, news, blogs and micro-blogs. Nevertheless, there is not a clear experimental evidence that shows what principled approach effectively combines time and content. We introduce a novel approach to model freshness using a survival analysis of relevance over time. In such models, freshness is measured by the tail probability of relevance over time. We also assume that the probability distributions for freshness are heavy-tailed. The heavy-tailed models of freshness are shown to be highly effective on the micro-blogging test collection of TREC 2011. The improvements over the state-of-the-art time-based models are statistically significant or moderately significant.
INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval | 2004
Gianni Amati; Claudio Carpineto; Giovanni Romano
Using separate indices for each element and merging their results has proven to be a feasible way of performing XML element retrieval; however, there has been little work on evaluating how the main method parameters affect the results. We study the effect of using different weighting models for computing rankings at the single index level and using different merging techniques for combining such rankings. Our main findings are that (i) there are large variations on retrieval effectiveness when choosing different techniques for weighting and merging, with performance gains up to 102%, and (ii) although there does not seem to be any best weighting model, some merging schemes perform clearly better than others.