Federico Nanni
University of Mannheim
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Federico Nanni.
international conference on the theory of information retrieval | 2017
Federico Nanni; Bhaskar Mitra; Matt Magnusson; Laura Dietz
Providing answers to complex information needs is a challenging task. The new TREC Complex Answer Retrieval (TREC CAR) track introduces a large-scale dataset where paragraphs are to be retrieved in response to outlines of Wikipedia articles representing complex information needs. We present early results from a variety of approaches -- from standard information retrieval methods (e.g., TF-IDF) to complex systems that adopt query expansion, knowledge bases and deep neural networks. The goal is to offer an overview of some promising approaches to tackle this problem.
joint conference on lexical and computational semantics | 2016
Goran Glavaš; Federico Nanni; Simone Paolo Ponzetto
Segmenting text into semantically coherent fragments improves readability of text and facilitates tasks like text summarization and passage retrieval. In this paper, we present a novel unsupervised algorithm for linear text segmentation (TS) that exploits word embeddings and a measure of semantic relatedness of short texts to construct a semantic relatedness graph of the document. Semantically coherent segments are then derived from maximal cliques of the relatedness graph. The algorithm performs competitively on a standard synthetic dataset and outperforms the best-performing method on a real-world (i.e., non-artificial) dataset of political manifestos.
acm ieee joint conference on digital libraries | 2017
Federico Nanni; Simone Paolo Ponzetto; Laura Dietz
Web archives preserve an unprecedented abundance of materials regarding major events and transformations in our society. In this paper, we present an approach for building event-centric sub-collections from such large archives, which includes not only the core documents related to the event itself but, even more importantly, documents describing related aspects (e.g., premises and consequences). This is achieved by 1) identifying relevant concepts and entities from a knowledge base, and 2) detecting their mentions in documents, which are interpreted as indicators for relevance. We extensively evaluate our system on two diachronic corpora, the New York Times Corpus and the US Congressional Record, and we test its performance on the TREC KBA Stream corpus, a large and publicly available web archive.
acm ieee joint conference on digital libraries | 2018
Federico Nanni; Simone Paolo Ponzetto; Laura Dietz
The availability of entity linking technologies provides a novel way to organize, categorize, and analyze large textual collections in digital libraries. However, in many situations a link to an entity offers only relatively coarse-grained semantic information. This is problematic especially when the entity is related to several different events, topics, roles, and -- more generally -- when it has different aspects. In this work, we introduce and address the task of entity-aspect linking: given a mention of an entity in a contextual passage, we refine the entity link with respect to the aspect of the entity it refers to. We show that a combination of different features and aspect representations in a learning-to-rank setting correctly predicts the entity-aspect in 70% of the cases. Additionally, we demonstrate significant and consistent improvements using entity-aspect linking on three entity prediction and categorization tasks relevant for the digital library community.
International Journal on Digital Libraries | 2018
Federico Nanni; Simone Paolo Ponzetto; Laura Dietz
Web archives, such as the Internet Archive, preserve an unprecedented abundance of materials regarding major events and transformations in our society. In this paper, we present an approach for building event-centric sub-collections from such large archives, which includes not only the core documents related to the event itself but, even more importantly, documents describing related aspects (e.g., premises and consequences). This is achieved by identifying relevant concepts and entities from a knowledge base, and then detecting their mentions in documents, which are interpreted as indicators for relevance. We extensively evaluate our system on two diachronic corpora, the New York Times Corpus and the US Congressional Record; additionally, we test its performance on the TREC KBA Stream Corpus and on the TREC-CAR dataset, two publicly available large-scale web collections.
computational social science | 2017
Goran Glavaš; Federico Nanni; Simone Paolo Ponzetto
In this paper, we propose an approach for cross-lingual topical coding of sentences from electoral manifestos of political parties in different languages. To this end, we exploit continuous semantic text representations and induce a joint multilingual semantic vector spaces to enable supervised learning using manually-coded sentences across different languages. Our experimental results show that classifiers trained on multilingual data yield performance boosts over monolingual topic classification.
Proceedings of the 6th International Workshop on Mining Scientific Publications | 2017
Federico Nanni; Giulia Paci
In recent years, academic research appears to have been going through a methodological turning point. The discussion around the impact that computational methods will have on traditional fields of study has been the focus of many calls for papers and panels at established conferences. However, despite the high prevalence of this topic in the academic debate, it remains very challenging to assess whether academia as a whole has been actually adopting more digital resources and methods during the recent years. We are currently studying this topic by combining hermeneutic and text mining practices while analyzing one of the primary research output of European universities, namely doctoral theses. In this work, we present an enriched dataset we created for addressing this research questions and the first results of the analyses we have conducted so far.
Archive | 2016
Federico Nanni; Simone Paolo Ponzetto; Laura Dietz
CLiC-it | 2017
Marco Rovera; Federico Nanni; Simone Paolo Ponzetto; Anna Goy
conference of the european chapter of the association for computational linguistics | 2017
Goran Glavaš; Federico Nanni; Simone Paolo Ponzetto