Is this you? Create Your Porfile

Salvatore Trani

Istituto di Scienza e Tecnologie dell'Informazione

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Salvatore Trani is active.

Explore More

Publication

Featured researches published by Salvatore Trani.

conference on information and knowledge management | 2013

Learning relatedness measures for entity linking

Diego Ceccarelli; Claudio Lucchese; Salvatore Orlando; Raffaele Perego; Salvatore Trani

Entity Linking is the task of detecting, in text documents, relevant mentions to entities of a given knowledge base. To this end, entity-linking algorithms use several signals and features extracted from the input text or from the knowledge base. The most important of such features is entity relatedness. Indeed, we argue that these algorithms benefit from maximizing the relatedness among the relevant entities selected for annotation, since this minimizes errors in disambiguating entity-linking. The definition of an effective relatedness function is thus a crucial point in any entity-linking algorithm. In this paper we address the problem of learning high quality entity relatedness functions. First, we formalize the problem of learning entity relatedness as a learning-to-rank problem. We propose a methodology to create reference datasets on the basis of manually annotated data. Finally, we show that our machine-learned entity relatedness function performs better than other relatedness functions previously proposed, and, more importantly, improves the overall performance of different state-of-the-art entity-linking algorithms.

exploiting semantic annotations in information retrieval | 2013

Dexter: an open source framework for entity linking

Diego Ceccarelli; Claudio Lucchese; Salvatore Orlando; Raffaele Perego; Salvatore Trani

We introduce Dexter, an open source framework for entity linking. The entity linking task aims at identifying all the small text fragments in a document referring to an entity contained in a given knowledge base, e.g., Wikipedia. The annotation is usually organized in three tasks. Given an input document the first task consists in discovering the fragments that could refer to an entity. Since a mention could refer to multiple entities, it is necessary to perform a disambiguation step, where the correct entity is selected among the candidates. Finally, discovered entities are ranked by some measure of relevance. Many entity linking algorithms have been proposed, but unfortunately only a few authors have released the source code or some APIs. As a result, evaluating today the performance of a method on a single subtask, or comparing different techniques is difficult. In this work we present a new open framework, called Dexter, which implements some popular algorithms and provides all the tools needed to develop any entity linking technique. We believe that a shared framework is fundamental to perform fair comparisons and improve the state of the art.

international acm sigir conference on research and development in information retrieval | 2016

Post-Learning Optimization of Tree Ensembles for Efficient Ranking

Claudio Lucchese; Franco Maria Nardini; Salvatore Orlando; Raffaele Perego; Fabrizio Silvestri; Salvatore Trani

Learning to Rank (LtR) is the machine learning method of choice for producing high quality document ranking functions from a ground-truth of training examples. In practice, efficiency and effectiveness are intertwined concepts and trading off effectiveness for meeting efficiency constraints typically existing in large-scale systems is one of the most urgent issues. In this paper we propose a new framework, named CLEaVER, for optimizing machine-learned ranking models based on ensembles of regression trees. The goal is to improve efficiency at document scoring time without affecting quality. Since the cost of an ensemble is linear in its size, CLEaVER first removes a subset of the trees in the ensemble, and then fine-tunes the weights of the remaining trees according to any given quality measure. Experiments conducted on two publicly available LtR datasets show that CLEaVER is able to prune up to 80% of the trees and provides an efficiency speed-up up to 2.6x without affecting the effectiveness of the model.

document engineering | 2016

SEL: A Unified Algorithm for Entity Linking and Saliency Detection

Salvatore Trani; Diego Ceccarelli; Claudio Lucchese; Salvatore Orlando; Raffaele Perego

The Entity Linking task consists in automatically identifying and linking the entities mentioned in a text to their URIs in a given Knowledge Base, e.g., Wikipedia. Entity Linking has a large im- pact in several text analysis and information retrieval related tasks. This task is very challenging due to natural language ambiguity. However, not all the entities mentioned in a document have the same relevance and utility in understanding the topics being dis- cussed. Thus, the related problem of identifying the most relevant entities present in a document, also known as Salient Entities, is attracting increasing interest. In this paper we propose SEL, a novel supervised two-step algo- rithm comprehensively addressing both entity linking and saliency detection. The first step is based on a classifier aimed at identi- fying a set of candidate entities that are likely to be mentioned in the document, thus maximizing the precision of the method with- out hindering its recall. The second step is still based on machine learning, and aims at choosing from the previous set the entities that actually occur in the document. Indeed, we tested two dif- ferent versions of the second step, one aimed at solving only the entity linking task, and the other that, besides detecting linked en- tities, also scores them according to their saliency. Experiments conducted on two different datasets show that the proposed algo- rithm outperforms state-of-the-art competitors, and is able to detect salient entities with high accuracy.

international acm sigir conference on research and development in information retrieval | 2017

X-DART: Blending Dropout and Pruning for Efficient Learning to Rank

Claudio Lucchese; Franco Maria Nardini; Salvatore Orlando; Raffaele Perego; Salvatore Trani

In this paper we propose X-DART, a new Learning to Rank algorithm focusing on the training of robust and compact ranking models. Motivated from the observation that the last trees of MART models impact the prediction of only a few instances of the training set, we borrow from the DART algorithm the dropout strategy consisting in temporarily dropping some of the trees from the ensemble while new weak learners are trained. However, differently from this algorithm we drop permanently these trees on the basis of smart choices driven by accuracy measured on the validation set. Experiments conducted on publicly available datasets shows that X-DART outperforms DART in training models providing the same effectiveness by employing up to 40% less trees.

international acm sigir conference on research and development in information retrieval | 2018

Selective Gradient Boosting for Effective Learning to Rank

Claudio Lucchese; Franco Maria Nardini; Raffaele Perego; Salvatore Orlando; Salvatore Trani

Learning an effective ranking function from a large number of query-document examples is a challenging task. Indeed, training sets where queries are associated with a few relevant documents and a large number of irrelevant ones are required to model real scenarios of Web search production systems, where a query can possibly retrieve thousands of matching documents, but only a few of them are actually relevant. In this paper, we propose Selective Gradient Boosting (SelGB), an algorithm addressing the Learning-to-Rank task by focusing on those irrelevant documents that are most likely to be mis-ranked, thus severely hindering the quality of the learned model. SelGB exploits a novel technique minimizing the mis-ranking risk, i.e., the probability that two randomly drawn instances are ranked incorrectly, within a gradient boosting process that iteratively generates an additive ensemble of decision trees. Specifically, at every iteration and on a per query basis, SelGB selectively chooses among the training instances a small sample of negative examples enhancing the discriminative power of the learned model. Reproducible and comprehensive experiments conducted on a publicly available dataset show that SelGB exploits the diversity and variety of the negative examples selected to train tree ensembles that outperform models generated by state-of-the-art algorithms by achieving improvements of NDCG@10 up to 3.2%.

computational intelligence | 2018

SEL: A unified algorithm for salient entity linking

Salvatore Trani; Claudio Lucchese; Raffaele Perego; David E. Losada; Diego Ceccarelli; Salvatore Orlando

The entity linking task consists in automatically identifying and linking the entities mentioned in a text to their uniform resource identifiers in a given knowledge base. This task is very challenging due to its natural language ambiguity. However, not all the entities mentioned in the document have the same utility in understanding the topics being discussed. Thus, the related problem of identifying the most relevant entities present in the document, also known as salient entities (SE), is attracting increasing interest. In this paper, we propose salient entity linking, a novel supervised 2‐step algorithm comprehensively addressing both entity linking and saliency detection. The first step is aimed at identifying a set of candidate entities that are likely to be mentioned in the document. The second step, besides detecting linked entities, also scores them according to their saliency. Experiments conducted on 2 different data sets show that the proposed algorithm outperforms state‐of‐the‐art competitors and is able to detect SE with high accuracy. Furthermore, we used salient entity linking for extractive text summarization. We found that entity saliency can be incorporated into text summarizers to extract salient sentences from text. The resulting summarizers outperform well‐known summarization systems, proving the importance of using the SE information.

international acm sigir conference on research and development in information retrieval | 2017

RankEval: An Evaluation and Analysis Framework for Learning-to-Rank Solutions

Claudio Lucchese; Cristina Ioana Muntean; Franco Maria Nardini; Raffaele Perego; Salvatore Trani

In this demo paper we propose RankEval, an open-source tool for the analysis and evaluation of Learning-to-Rank (LtR) models based on ensembles of regression trees. Gradient Boosted Regression Trees (GBRT) is a flexible statistical learning technique for classification and regression at the state of the art for training effective LtR solutions. Indeed, the success of GBRT fostered the development of several open-source LtR libraries targeting efficiency of the learning phase and effectiveness of the resulting models. However, these libraries offer only very limited help for the tuning and evaluation of the trained models. In addition, the implementations provided for even the most traditional IR evaluation metrics differ from library to library, thus making the objective evaluation and comparison between trained models a difficult task. RankEval addresses these issues by providing a common ground for LtR libraries that offers useful and interoperable tools for a comprehensive comparison and in-depth analysis of ranking models.

conference on information and knowledge management | 2014

Manual Annotation of Semi-Structured Documents for Entity-Linking

Salvatore Trani; Diego Ceccarelli; Claudio Lucchese; Salvatore Orlando; Raffaele Perego

The Entity Linking (EL) problem consists in automatically linking short fragments of text within a document to entities in a given Knowledge Base like Wikipedia. Due to its impact in several text-understanding related tasks, EL is an hot research topic. The correlated problem of devising the most relevant entities mentioned in the document, a.k.a. salient entities (SE), is also attracting increasing interest. Unfortunately, publicly available evaluation datasets that contain accurate and supervised knowledge about mentioned entities and their relevance ranking are currently very poor both in number and quality. This lack makes very difficult to compare different EL and SE solutions on a fair basis, as well as to devise innovative techniques that relies on these datasets to train machine learning models, in turn used to automatically link and rank entities. In this demo paper we propose a Web-deployed tool that allows to crowdsource the creation of these datasets, by supporting the collaborative human annotation of semi-structured documents. The tool, called Elianto, is actually an open source framework, which provides a user friendly and reactive Web interface to support both EL and SE labelling tasks, through a guided two-step process.

international semantic web conference | 2014