Roberto Zanoli | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Roberto Zanoli is active.

Explore More

Publication

Featured researches published by Roberto Zanoli.

meeting of the association for computational linguistics | 2014

The Excitement Open Platform for Textual Inferences

Bernardo Magnini; Roberto Zanoli; Ido Dagan; Kathrin Eichler; Guenter Neumann; Tae-Gil Noh; Sebastian Padó; Asher Stern; Omer Levy

This paper presents the Excitement Open Platform (EOP), a generic architecture and a comprehensive implementation for textual inference in multiple languages. The platform includes state-of-art algorithms, a large number of knowledge resources, and facilities for experimenting and testing innovative approaches. The EOP is distributed as an open source software.

ACM Transactions on Speech and Language Processing | 2006

Automatic expansion of domain-specific lexicons by term categorization

Henri Avancini; Alberto Lavelli; Fabrizio Sebastiani; Roberto Zanoli

We discuss an approach to the automatic expansion ofdomain-specific lexicons, that is, to the problem ofextending, for each ci in a predefined setC ={c1,…,cm} ofsemantic domains, an initial lexiconLi0 into a larger lexiconLi1. Our approach relies onterm categorization, defined as the task of labelingpreviously unlabeled terms according to a predefined set ofdomains. We approach this as a supervised learning problem in whichterm classifiers are built using the initial lexicons as trainingdata. Dually to classic text categorization tasks in whichdocuments are represented as vectors in a space of terms, werepresent terms as vectors in a space of documents. We present theresults of a number of experiments in which we use a boosting-basedlearning device for training our term classifiers. We test theeffectiveness of our method by using WordNetDomains, a well-knownlarge set of domain-specific lexicons, as a benchmark. Ourexperiments are performed using the documents in the Reuters CorpusVolume 1 as implicit representations for our terms.

Natural Language Engineering | 2015

Design and realization of a modular architecture for textual entailment

Sebastian Padó; Tae-Gil Noh; Asher Stern; Rui Wang; Roberto Zanoli

A key challenge at the core of many Natural Language Processing (NLP) tasks is the ability to determine which conclusions can be inferred from a given natural language text. This problem, called the Recognition of Textual Entailment (RTE) , has initiated the development of a range of algorithms, methods, and technologies. Unfortunately, research on Textual Entailment (TE), like semantics research more generally, is fragmented into studies focussing on various aspects of semantics such as world knowledge, lexical and syntactic relations, or more specialized kinds of inference. This fragmentation has problematic practical consequences. Notably, interoperability among the existing RTE systems is poor, and reuse of resources and algorithms is mostly infeasible. This also makes systematic evaluations very difficult to carry out. Finally, textual entailment presents a wide array of approaches to potential end users with little guidance on which to pick. Our contribution to this situation is the novel EXCITEMENT architecture, which was developed to enable and encourage the consolidation of methods and resources in the textual entailment area. It decomposes RTE into components with strongly typed interfaces. We specify (a) a modular linguistic analysis pipeline and (b) a decomposition of the ‘core’ RTE methods into top-level algorithms and subcomponents. We identify four major subcomponent types, including knowledge bases and alignment methods. The architecture was developed with a focus on generality, supporting all major approaches to RTE and encouraging language independence. We illustrate the feasibility of the architecture by constructing mappings of major existing systems onto the architecture. The practical implementation of this architecture forms the EXCITEMENT open platform. It is a suite of textual entailment algorithms and components which contains the three systems named above, including linguistic-analysis pipelines for three languages (English, German, and Italian), and comprises a number of linguistic resources. By addressing the problems outlined above, the platform provides a comprehensive and flexible basis for research and experimentation in textual entailment and is available as open source software under the GNU General Public License.

acm symposium on applied computing | 2003

Expanding domain-specific lexicons by term categorization

Henri Avancini; Alberto Lavelli; Bernardo Magnini; Fabrizio Sebastiani; Roberto Zanoli

We discuss an approach to the automatic expansion of domain-specific lexicons by means of term categorization, a novel task employing techniques from information retrieval (IR) and machine learning (ML). Specifically, we view the expansion of such lexicons as a process of learning previously unknown associations between terms and domains. The process generates, for each c<inf>i</inf> in a set C = {c<inf>1</inf>,..., c<inf>m</inf>} of domains, a lexicon Li<inf>1</inf>, boostrapping from an initial lexicon Li<inf>0</inf> and a set of documents θ given as input. The method is inspired by text categorization (TC), the discipline concerned with labelling natural language texts with labels from a predefined set of domains, or categories. However, while TC deals with documents represented as vectors in a space of terms, we formulate the task of term categorization as one in which terms are (dually) represented as vectors in a space of documents, and in which terms (instead of documents) are labelled with domains.

Database | 2016

A knowledge-poor approach to chemical-disease relation extraction

Firoj Alam; Anna Corazza; Alberto Lavelli; Roberto Zanoli

The article describes a knowledge-poor approach to the task of extracting Chemical-Disease Relations from PubMed abstracts. A first version of the approach was applied during the participation in the BioCreative V track 3, both in Disease Named Entity Recognition and Normalization (DNER) and in Chemical-induced diseases (CID) relation extraction. For both tasks, we have adopted a general-purpose approach based on machine learning techniques integrated with a limited number of domain-specific knowledge resources and using freely available tools for preprocessing data. Crucially, the system only uses the data sets provided by the organizers. The aim is to design an easily portable approach with a limited need of domain-specific knowledge resources. In the participation in the BioCreative V task, we ranked 5 out of 16 in DNER, and 7 out of 18 in CID. In this article, we present our follow-up study in particular on CID by performing further experiments, extending our approach and improving the performance.

International Workshop on Evaluation of Natural Language and Speech Tool for Italian | 2013

Exploiting Background Knowledge for Clustering Person Names

Roberto Zanoli; Francesco Corcoglioniti; Christian Girardi

Nowadays, surfing the Web and looking for persons seems to be one of the most common activities of Internet users. However, person names could be highly ambiguous and consequently search results are often a collection of documents about different people sharing the same name. In this paper a cross-document coreference system able to identify person names in different documents which refer to the same person entity is presented. The system exploits background knowledge through two mechanisms: (1) the use of a dynamic similarity threshold for clustering person names, which depends on the ambiguity of the name estimated using a phonebook; and (2) the disambiguation of names against a knowledge base containing person descriptions, using an entity linking system and including its output as an additional feature for computing similarity. The paper describes the system and reports its performance tested taking part in the News People Search (NePS) task at Evalita 2011. A version of the system is being used in a real-word application, which requires to corefer millions of names from multimedia sources.

Archive | 2013

Anchoring Background Knowledge to Rich Multimedia Contexts in the KnowledgeStore

Roldano Cattoni; F. Corcoglioniti; Christian Girardi; Bernardo Magnini; Luciano Serafini; Roberto Zanoli

The recent achievements in Natural Language Processing in terms of scalability and performance, and the large availability of background knowledge within the Semantic Web and the Linked Open Data initiative, encourage researchers in doing a further step towards the creation of machines capable of understanding multimedia documents by exploiting background knowledge. To pursue this direction it turns out to be necessary to maintain a clear link between knowledge and the documents containing it. This is achieved in the KnowledgeStore, a scalable content management system that supports the tight integration and storage of multimedia resources and background and extracted knowledge. Integration is done by (i)identifying mentions of named entities in multimedia resources, (ii)establishing mention coreference and either (iii)linking mentions to entities in the background knowledge, or (iv)extending that knowledge with new entities. We present the KnowledgeStore and describe its use in creating a large scale repository of knowledge and multimedia resources in the Italian Trentino region, whose interlinking allows us to explore advanced tasks such as entity-based search and semantic enrichment.

conference on recommender systems | 2017

Pokedem: an Automatic Social Media Management Application

Francesco Corcoglioniti; Claudio Giuliano; Yaroslav Nechaev; Roberto Zanoli

Typically, the task of managing the social media presence of a company or a public person is the job of a dedicated social media account manager. While many attempts have been made in recent years to provide more automation to account managers, complete workflow automation has still to be achieved. Pokedem is a social media management application that aims at filling this gap, recommending actions that could be performed by the account manager to increase the popularity of a Twitter account and the engagement of its audience. By casting the problem in the setting of recommendation systems, Pokedem is able to provide account managers with a complete tool for automating their daily activities.

Italian Natural Language Processing within the PARLI Project | 2015

Comparing Named Entity Recognition on Transcriptions and Written Texts

Firoj Alam; Bernardo Magnini; Roberto Zanoli

The ability to recognize named entities (e.g., person, location and organization names) in texts has been proved as an important task for several natural language processing areas, including Information Retrieval and Information Extraction. However, despite the efforts and the achievements obtained in Named Entity Recognition from written texts, the problem of recognizing named entities from automatic transcriptions of spoken documents is still far from being solved. In fact, the output of Automatic Speech Recognition (ASR) often contains transcription errors; in addition, many named entities are out-of-vocabulary words, which makes them not available to the ASR. This paper presents a comparative analysis of extracting named entities both from written texts and from transcriptions. As for transcriptions, we have used spoken broadcast news, while for written texts we have used both newspapers of the same domain of the transcriptions and the manual transcriptions of the broadcast news. The comparison was carried on a number of experiments using the best Named Entity Recognition system presented at Evalita 2007.

International Workshop on Evaluation of Natural Language and Speech Tool for Italian | 2013

A Combination of Classifiers for Named Entity Recognition on Transcription

Firoj Alam; Roberto Zanoli

This paper presents a Named Entity Recognition (NER) system on broadcast news transcription where two different classifiers are set up in a loop so that the output of one of the classifiers is exploited by the other to refine its decision. The approach we followed is similar to that used in Typhoon, which is a NER system designed for newspaper articles; in that respect, one of the distinguishing features of our approach is the use of Conditional Random Fields in place of Hidden Markov Models. To make the second classifier we extracted sentences from a large unlabelled corpus. Another relevant feature is instead strictly related to transcription annotations. Transcriptions lack orthographic and punctuation information and this typically results in poor performance. As a result, an additional module for case and punctuation restoration has been developed. This paper describes the system and reports its performance which is evaluated by taking part in Evalita 2011 in the task of Named Entity Recognition on Transcribed Broadcast News. In addition, the Evalita 2009 dataset, consisting of newspapers articles, is used to present a comparative analysis by extracting named entities from newspapers and broadcast news.

Explore More