Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Alberto Lavelli is active.

Publication


Featured researches published by Alberto Lavelli.


Journal of Biomedical Semantics | 2011

Assessment of NER solutions against the first and second CALBC Silver Standard Corpus

Dietrich Rebholz-Schuhmann; Antonio Jimeno Yepes; Chen Li; Senay Kafkas; Ian Lewin; Ning Kang; Peter Corbett; David Milward; Ekaterina Buyko; Elena Beisswanger; Kerstin Hornbostel; Alexandre Kouznetsov; René Witte; Jonas B. Laurila; Christopher J. O. Baker; Cheng-Ju Kuo; Simone Clematide; Fabio Rinaldi; Richárd Farkas; György Móra; Kazuo Hara; Laura I. Furlong; Michael Rautschka; Mariana Neves; Alberto Pascual-Montano; Qi Wei; Nigel Collier; Faisal Mahbub Chowdhury; Alberto Lavelli; Rafael Berlanga

BackgroundCompetitions in text mining have been used to measure the performance of automatic text processing solutions against a manually annotated gold standard corpus (GSC). The preparation of the GSC is time-consuming and costly and the final corpus consists at the most of a few thousand documents annotated with a limited set of semantic groups. To overcome these shortcomings, the CALBC project partners (PPs) have produced a large-scale annotated biomedical corpus with four different semantic groups through the harmonisation of annotations from automatic text mining solutions, the first version of the Silver Standard Corpus (SSC-I). The four semantic groups are chemical entities and drugs (CHED), genes and proteins (PRGE), diseases and disorders (DISO) and species (SPE). This corpus has been used for the First CALBC Challenge asking the participants to annotate the corpus with their text processing solutions.ResultsAll four PPs from the CALBC project and in addition, 12 challenge participants (CPs) contributed annotated data sets for an evaluation against the SSC-I. CPs could ignore the training data and deliver the annotations from their genuine annotation system, or could train a machine-learning approach on the provided pre-annotated data. In general, the performances of the annotation solutions were lower for entities from the categories CHED and PRGE in comparison to the identification of entities categorized as DISO and SPE. The best performance over all semantic groups were achieved from two annotation solutions that have been trained on the SSC-I.The data sets from participants were used to generate the harmonised Silver Standard Corpus II (SSC-II), if the participant did not make use of the annotated data set from the SSC-I for training purposes. The performances of the participants’ solutions were again measured against the SSC-II. The performances of the annotation solutions showed again better results for DISO and SPE in comparison to CHED and PRGE.ConclusionsThe SSC-I delivers a large set of annotations (1,121,705) for a large number of documents (100,000 Medline abstracts). The annotations cover four different semantic groups and are sufficiently homogeneous to be reproduced with a trained classifier leading to an average F-measure of 85%. Benchmarking the annotation solutions against the SSC-II leads to better performance for the CPs’ annotation solutions in comparison to the SSC-I.


international conference on machine learning | 2005

Evaluating machine learning for information extraction

Neil Ireson; Fabio Ciravegna; Mary Elaine Califf; Dayne Freitag; Nicholas Kushmerick; Alberto Lavelli

Comparative evaluation of Machine Learning (ML) systems used for Information Extraction (IE) has suffered from various inconsistencies in experimental procedures. This paper reports on the results of the Pascal Challenge on Evaluating Machine Learning for Information Extraction, which provides a standardised corpus, set of tasks, and evaluation methodology. The challenge is described and the systems submitted by the ten participants are briefly introduced and their performance is analysed.


ACM Transactions on Speech and Language Processing | 2007

Relation extraction and the influence of automatic named-entity recognition

Claudio Giuliano; Alberto Lavelli; Lorenza Romano

We present an approach for extracting relations between named entities from natural language documents. The approach is based solely on shallow linguistic processing, such as tokenization, sentence splitting, part-of-speech tagging, and lemmatization. It uses a combination of kernel functions to integrate two different information sources: (i) the whole sentence where the relation appears, and (ii) the local contexts around the interacting entities. We present the results of experiments on extracting five different types of relations from a dataset of newswire documents and show that each information source provides a useful contribution to the recognition task. Usually the combined kernel significantly increases the precision with respect to the basic kernels, sometimes at the cost of a slightly lower recall. Moreover, we performed a set of experiments to assess the influence of the accuracy of named-entity recognition on the performance of the relation-extraction algorithm. Such experiments were performed using both the correct named entities (i.e., those manually annotated in the corpus) and the noisy named entities (i.e., those produced by a machine learning-based named-entity recognizer). The results show that our approach significantly improves the previous results obtained on the same dataset.


extended semantic web conference | 2013

Automatic Expansion of DBpedia Exploiting Wikipedia Cross-Language Information

Alessio Palmero Aprosio; Claudio Giuliano; Alberto Lavelli

DBpedia is a project aiming to represent Wikipedia content in RDF triples. It plays a central role in the Semantic Web, due to the large and growing number of resources linked to it. Nowadays, only 1.7M Wikipedia pages are deeply classified in the DBpedia ontology, although the English Wikipedia contains almost 4M pages, showing a clear problem of coverage. In other languages (like French and Spanish) this coverage is even lower. The objective of this paper is to define a methodology to increase the coverage of DBpedia in different languages. The major problems that we have to solve concern the high number of classes involved in the DBpedia ontology and the lack of coverage for some classes in certain languages. In order to deal with these problems, we first extend the population of the classes for the different languages by connecting the corresponding Wikipedia pages through cross-language links. Then, we train a supervised classifier using this extended set as training data. We evaluated our system using a manually annotated test set, demonstrating that our approach can add more than 1M new entities to DBpedia with high precision (90%) and recall (50%). The resulting resource is available through a SPARQL endpoint and a downloadable package.


Natural Language Engineering | 2004

LearningPinocchio: adaptive information extraction for real world applications

Fabio Ciravegna; Alberto Lavelli

The new frontier of research on Information Extraction from texts is portability without any knowledge of Natural Language Processing. The market potential is very large in principle, provided that a suitable easy-to-use and effective methodology is provided. In this paper we describe LearningPinocchio, a system for adaptive Information Extraction from texts that is having good commercial and scientific success. Real world applications have been built and evaluation licenses have been released to external companies for application development. In this paper we outline the basic algorithm behind the scenes and present a number of applications developed with LearningPinocchio. Then we report about an evaluation performed by an independent company. Finally, we discuss the general suitability of this IE technology for real world applications and draw some conclusion.


meeting of the association for computational linguistics | 2007

FBK-IRST: Kernel Methods for Semantic Relation Extraction

Claudio Giuliano; Alberto Lavelli; Daniele Pighin; Lorenza Romano

We present an approach for semantic relation extraction between nominals that combines shallow and deep syntactic processing and semantic information using kernel methods. Two information sources are considered: (i) the whole sentence where the relation appears, and (ii) WordNet synsets and hypernymy relations of the candidate nominals. Each source of information is represented by kernel functions. In particular, five basic kernel functions are linearly combined and weighted under different conditions. The experiments were carried out using support vector machines as classifier. The system achieves an overall F1 of 71.8% on the Classification of Semantic Relations between Nominals task at SemEval-2007.


ACM Transactions on Speech and Language Processing | 2006

Automatic expansion of domain-specific lexicons by term categorization

Henri Avancini; Alberto Lavelli; Fabrizio Sebastiani; Roberto Zanoli

We discuss an approach to the automatic expansion of<i>domain-specific lexicons</i>, that is, to the problem ofextending, for each <i>c</i><sub><i>i</i></sub> in a predefined set<i>C</i> ={<i>c</i><sub>1</sub>,…,<i>c</i><sub><i>m</i></sub>} ofsemantic <i>domains</i>, an initial lexicon<i>L</i><sup><i>i</i></sup><sub>0</sub> into a larger lexicon<i>L</i><sup><i>i</i></sup><sub>1</sub>. Our approach relies on<i>term categorization</i>, defined as the task of labelingpreviously unlabeled terms according to a predefined set ofdomains. We approach this as a supervised learning problem in whichterm classifiers are built using the initial lexicons as trainingdata. Dually to classic text categorization tasks in whichdocuments are represented as vectors in a space of terms, werepresent terms as vectors in a space of documents. We present theresults of a number of experiments in which we use a boosting-basedlearning device for training our term classifiers. We test theeffectiveness of our method by using WordNetDomains, a well-knownlarge set of domain-specific lexicons, as a benchmark. Ourexperiments are performed using the documents in the Reuters CorpusVolume 1 as implicit representations for our terms.


conference of the european chapter of the association for computational linguistics | 1999

Full text parsing using cascades of rules: an information extraction perspective

Fabio Ciravegna; Alberto Lavelli

This paper proposes an approach to full parsing suitable for Information Extraction from texts. Sequences of cascades of rules deterministically analyze the text, building unambiguous structures. Initially basic chunks are analyzed; then argumental relations are recognized; finally modifier attachment is performed and the global parse tree is built. The approach was proven to work for three languages and different domains. It was implemented in the IE module of FACILE, a EU project for multilingual text classification and IE.


acm symposium on applied computing | 2003

Expanding domain-specific lexicons by term categorization

Henri Avancini; Alberto Lavelli; Bernardo Magnini; Fabrizio Sebastiani; Roberto Zanoli

We discuss an approach to the automatic expansion of domain-specific lexicons by means of <i>term categorization</i>, a novel task employing techniques from information retrieval (IR) and machine learning (ML). Specifically, we view the expansion of such lexicons as a process of learning previously unknown associations between terms and <i>domains</i>. The process generates, for each <i>c<inf>i</inf></i> in a set <i>C</i> = {<i>c</i><inf>1</inf>,..., <i>c<inf>m</inf></i>} of domains, a lexicon <i>L<sup>i</sup></i><inf>1</inf>, boostrapping from an initial lexicon <i>L<sup>i</sup></i><inf>0</inf> and a set of documents θ given as input. The method is inspired by <i>text categorization</i> (TC), the discipline concerned with labelling natural language texts with labels from a predefined set of domains, or categories. However, while TC deals with documents represented as vectors in a space of terms, we formulate the task of term categorization as one in which terms are (dually) represented as vectors in a space of documents, and in which terms (instead of documents) are labelled with domains.


language resources and evaluation | 2008

Evaluation of machine learning-based information extraction algorithms: criticisms and recommendations

Alberto Lavelli; Mary Elaine Califf; Fabio Ciravegna; Dayne Freitag; Claudio Giuliano; Nicholas Kushmerick; Lorenza Romano; Neil Ireson

We survey the evaluation methodology adopted in information extraction (IE), as defined in a few different efforts applying machine learning (ML) to IE. We identify a number of critical issues that hamper comparison of the results obtained by different researchers. Some of these issues are common to other NLP-related tasks: e.g., the difficulty of exactly identifying the effects on performance of the data (sample selection and sample size), of the domain theory (features selected), and of algorithm parameter settings. Some issues are specific to IE: how leniently to assess inexact identification of filler boundaries, the possibility of multiple fillers for a slot, and how the counting is performed. We argue that, when specifying an IE task, these issues should be explicitly addressed, and a number of methodological characteristics should be clearly defined. To empirically verify the practical impact of the issues mentioned above, we perform a survey of the results of different algorithms when applied to a few standard datasets. The survey shows a serious lack of consensus on these issues, which makes it difficult to draw firm conclusions on a comparative evaluation of the algorithms. Our aim is to elaborate a clear and detailed experimental methodology and propose it to the IE community. Widespread agreement on this proposal should lead to future IE comparative evaluations that are fair and reliable. To demonstrate the way the methodology is to be applied we have organized and run a comparative evaluation of ML-based IE systems (the Pascal Challenge on ML-based IE) where the principles described in this article are put into practice. In this article we describe the proposed methodology and its motivations. The Pascal evaluation is then described and its results presented.

Collaboration


Dive into the Alberto Lavelli's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Lorenza Romano

fondazione bruno kessler

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Anna Corazza

University of Naples Federico II

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Roberto Zanoli

fondazione bruno kessler

View shared research outputs
Researchain Logo
Decentralizing Knowledge