Diego Ceccarelli | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Diego Ceccarelli is active.

Explore More

Publication

Featured researches published by Diego Ceccarelli.

international world wide web conferences | 2015

GERBIL: General Entity Annotator Benchmarking Framework

Michael Röder; Axel-Cyrille Ngonga Ngomo; Ciro Baron; Andreas Both; Martin Brümmer; Diego Ceccarelli; Marco Cornolti; Didier Cherix; Bernd Eickmann; Paolo Ferragina; Christiane Lemke; Andrea Moro; Roberto Navigli; Francesco Piccinno; Giuseppe Rizzo; Harald Sack; René Speck; Raphaël Troncy; Jörg Waitelonis; Lars Wesemann

We present GERBIL, an evaluation framework for semantic entity annotation. The rationale behind our framework is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation tools on multiple datasets. By these means, we aim to ensure that both tool developers and end users can derive meaningful insights pertaining to the extension, integration and use of annotation applications. In particular, GERBIL provides comparable results to tool developers so as to allow them to easily discover the strengths and weaknesses of their implementations with respect to the state of the art. With the permanent experiment URIs provided by our framework, we ensure the reproducibility and archiving of evaluation results. Moreover, the framework generates data in machine-processable format, allowing for the efficient querying and post-processing of evaluation results. Finally, the tool diagnostics provided by GERBIL allows deriving insights pertaining to the areas in which tools should be further refined, thus allowing developers to create an informed agenda for extensions and end users to detect the right tools for their purposes. GERBIL aims to become a focal point for the state of the art, driving the research agenda of the community by presenting comparable objective evaluation results.

conference on information and knowledge management | 2013

Learning relatedness measures for entity linking

Diego Ceccarelli; Claudio Lucchese; Salvatore Orlando; Raffaele Perego; Salvatore Trani

Entity Linking is the task of detecting, in text documents, relevant mentions to entities of a given knowledge base. To this end, entity-linking algorithms use several signals and features extracted from the input text or from the knowledge base. The most important of such features is entity relatedness. Indeed, we argue that these algorithms benefit from maximizing the relatedness among the relevant entities selected for annotation, since this minimizes errors in disambiguating entity-linking. The definition of an effective relatedness function is thus a crucial point in any entity-linking algorithm. In this paper we address the problem of learning high quality entity relatedness functions. First, we formalize the problem of learning entity relatedness as a learning-to-rank problem. We propose a methodology to create reference datasets on the basis of manually annotated data. Finally, we show that our machine-learned entity relatedness function performs better than other relatedness functions previously proposed, and, more importantly, improves the overall performance of different state-of-the-art entity-linking algorithms.

exploiting semantic annotations in information retrieval | 2013

Dexter: an open source framework for entity linking

Diego Ceccarelli; Claudio Lucchese; Salvatore Orlando; Raffaele Perego; Salvatore Trani

We introduce Dexter, an open source framework for entity linking. The entity linking task aims at identifying all the small text fragments in a document referring to an entity contained in a given knowledge base, e.g., Wikipedia. The annotation is usually organized in three tasks. Given an input document the first task consists in discovering the fragments that could refer to an entity. Since a mention could refer to multiple entities, it is necessary to perform a disambiguation step, where the correct entity is selected among the candidates. Finally, discovered entities are ranked by some measure of relevance. Many entity linking algorithms have been proposed, but unfortunately only a few authors have released the source code or some APIs. As a result, evaluating today the performance of a method on a single subtask, or comparing different techniques is difficult. In this work we present a new open framework, called Dexter, which implements some popular algorithms and provides all the tools needed to develop any entity linking technique. We believe that a shared framework is fundamental to perform fair comparisons and improve the state of the art.

conference on information and knowledge management | 2012

You should read this! let me explain you why: explaining news recommendations to users

Roi Blanco; Diego Ceccarelli; Claudio Lucchese; Raffaele Perego; Fabrizio Silvestri

Recommender systems have become ubiquitous in content-based web applications, from news to shopping sites. Nonetheless, an aspect that has been largely overlooked so far in the recommender system literature is that of automatically building explanations for a particular recommendation. This paper focuses on the news domain, and proposes to enhance effectiveness of news recommender systems by adding, to each recommendation, an explanatory statement to help the user to better understand if, and why, the item can be her interest. We consider the news recommender system as a black-box, and generate different types of explanations employing pieces of information associated with the news. In particular, we engineer text-based, entity-based, and usage-based explanations, and make use of a Markov Logic Networks to rank the explanations on the basis of their effectiveness. The assessment of the model is conducted via a user study on a dataset of news read consecutively by actual users. Experiments show that news recommender systems can greatly benefit from our explanation module as it allows users to discriminate between interesting and not interesting news in the majority of the cases.

theory and practice of digital libraries | 2011

Improving Europeana search experience using query logs

Diego Ceccarelli; Sergiu Gordea; Claudio Lucchese; Franco Maria Nardini; Gabriele Tolomei

Europeana is a long-term project funded by the European Commission with the goal of making Europes cultural and scientific heritage accessible to the public. Since 2008, about 1500 institutions have contributed to Europeana, enabling people to explore the digital resources of Europes museums, libraries and archives. The huge amount of collected multi-lingual multi-media data is made available today through the Europeana portal, a search engine allowing users to explore such content through textual queries. One of the most important techniques for enhancing users search experience in large information spaces, is the exploitation of the knowledge contained in query logs. In this paper we present a characterization of the Europeana query log, showing statistics on common behavioral patterns of the Europeana users. Our analysis highlights some significative differences between the Europeana query log and the historical data collected by general purpose Web Search Engine logs. In particular, we find out that both query and search session distributions show different behaviors. Finally, we use this information for designing a query recommendation technique having the goal of enhancing the functionality of the Europeana portal.

international acm sigir conference on research and development in information retrieval | 2017

Lucene4IR: Developing Information Retrieval Evaluation Resources using Lucene

Leif Azzopardi; Yashar Moshfeghi; Martin Halvey; Rami Suleiman Alkhawaldeh; Krisztian Balog; Emanuele Di Buccio; Diego Ceccarelli; Juan M. Fernández-Luna; Charlie Hull; Jake Mannix; Sauparna Palchowdhury

The workshop and hackathon on developing Information Retrieval Evaluation Resources using Lucene (L4IR) was held on the 8th and 9th of September, 2016 at the University of Strathclyde in Glasgow, UK and funded by the ESF Elias Network. The event featured three main elements: (i) a series of keynote and invited talks on industry, teaching and evaluation; (ii) planning, coding and hacking where a number of groups created modules and infrastructure to use Lucene to undertake TREC based evaluations; and (iii) a number of breakout groups discussing challenges, opportunities and problems in bridging the divide between academia and industry, and how we can use Lucene for teaching and learning Information Retrieval (IR). The event was composed of a mix and blend of academics, experts and students wanting to learn, share and create evaluation resources for the community. The hacking was intense and the discussions lively creating the basis of many useful tools but also raising numerous issues. It was clear that by adopting and contributing to most widely used and supported Open Source IR toolkit, there were many benefits for academics, students, researchers, developers and practitioners - providing a basis for stronger evaluation practices, increased reproducibility, more efficient knowledge transfer, greater collaboration between academia and industry, and shared teaching and training resources.

conference on information and knowledge management | 2013

Twitter anticipates bursts of requests for Wikipedia articles

Gabriele Tolomei; Salvatore Orlando; Diego Ceccarelli; Claudio Lucchese

Most of the tweets that users exchange on Twitter make implicit mentions of named-entities, which in turn can be mapped to corresponding Wikipedia articles using proper Entity Linking (EL) techniques. Some of those become trending entities on Twitter due to a long-lasting or a sudden effect on the volume of tweets where they are mentioned. We argue that the set of trending entities discovered from Twitter may help predict the volume of requests for relating Wikipedia articles. To validate this claim, we apply an EL technique to extract trending entities from a large dataset of public tweets. Then, we analyze the time series derived from the hourly trending score (i.e., an index of popularity) of each entity as measured by Twitter and Wikipedia, respectively. Our results reveals that Twitter actually leads Wikipedia by one or more hours.

international acm sigir conference on research and development in information retrieval | 2016

Ranking Financial Tweets

Diego Ceccarelli; Francesco Nidito; Miles Osborne

Recently Twitter has complemented traditional newswire as a source of valuable Financial information. Although there is a rich body of published research dealing with the task of ranking tweets, there has been little published research dealing with ranking tweets within a Financial context. Here we consider whether popularity factors within Twitter can be used as a signal for popularity within the domain of financial experts. Our results suggest that what interests Finance is not the same as what interests the users of Twitter.

document engineering | 2016

SEL: A Unified Algorithm for Entity Linking and Saliency Detection

Salvatore Trani; Diego Ceccarelli; Claudio Lucchese; Salvatore Orlando; Raffaele Perego

The Entity Linking task consists in automatically identifying and linking the entities mentioned in a text to their URIs in a given Knowledge Base, e.g., Wikipedia. Entity Linking has a large im- pact in several text analysis and information retrieval related tasks. This task is very challenging due to natural language ambiguity. However, not all the entities mentioned in a document have the same relevance and utility in understanding the topics being dis- cussed. Thus, the related problem of identifying the most relevant entities present in a document, also known as Salient Entities, is attracting increasing interest. In this paper we propose SEL, a novel supervised two-step algo- rithm comprehensively addressing both entity linking and saliency detection. The first step is based on a classifier aimed at identi- fying a set of candidate entities that are likely to be mentioned in the document, thus maximizing the precision of the method with- out hindering its recall. The second step is still based on machine learning, and aims at choosing from the previous set the entities that actually occur in the document. Indeed, we tested two dif- ferent versions of the second step, one aimed at solving only the entity linking task, and the other that, besides detecting linked en- tities, also scores them according to their saliency. Experiments conducted on two different datasets show that the proposed algo- rithm outperforms state-of-the-art competitors, and is able to detect salient entities with high accuracy.

exploiting semantic annotations in information retrieval | 2014

Bringing Head Closer to the Tail with Entity Linking

Manisha Verma; Diego Ceccarelli

With the creation and rapid development of knowledge bases, it has become easier to understand the underlying semantics of unstructured text (short or long) on the web. In this work we especially look at the impact of entity linking on search logs. Search queries follow a Zipfian distribution wherein other than few popular queries (head queries), a significant percentage of queries (tail queries) occur rarely. Given a search log, there is sufficient data to analyze head queries but insufficient data (low frequency, limited clicks) to draw any conclusions about tail queries. In this work we focus on quantifying the extent of overlap between long tail and head queries by means of entity linking. We specifically analyze the frequency distribution of entities in head and tail queries. Our analysis shows that by means of entity linking, we can indeed bridge the gap between the head and tail.

Explore More