Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Marco Basaldella is active.

Publication


Featured researches published by Marco Basaldella.


Journal of Biomedical Semantics | 2017

Entity recognition in the biomedical domain using a hybrid approach

Marco Basaldella; Lenz Furrer; Carlo Tasso; Fabio Rinaldi

BackgroundThis article describes a high-recall, high-precision approach for the extraction of biomedical entities from scientific articles.MethodThe approach uses a two-stage pipeline, combining a dictionary-based entity recognizer with a machine-learning classifier. First, the OGER entity recognizer, which has a bias towards high recall, annotates the terms that appear in selected domain ontologies. Subsequently, the Distiller framework uses this information as a feature for a machine learning algorithm to select the relevant entities only. For this step, we compare two different supervised machine-learning algorithms: Conditional Random Fields and Neural Networks.ResultsIn an in-domain evaluation using the CRAFT corpus, we test the performance of the combined systems when recognizing chemicals, cell types, cellular components, biological processes, molecular functions, organisms, proteins, and biological sequences. Our best system combines dictionary-based candidate generation with Neural-Network-based filtering. It achieves an overall precision of 86% at a recall of 60% on the named entity recognition task, and a precision of 51% at a recall of 49% on the concept recognition task.ConclusionThese results are to our knowledge the best reported so far in this particular task.


italian research conference on digital library management systems | 2015

A Content-Based Approach to Social Network Analysis: A Case Study on Research Communities

Dario De Nart; Dante Degl’Innocenti; Marco Basaldella; Maristella Agosti; Carlo Tasso

Several works in literature investigated the activities of research communities using big data analysis, but the large majority of them focuses on papers and co-authorship relations, ignoring that most of the scientific literature available is already clustered into journals and conferences with a well defined domain of interest. We are interested in bringing out underlying implicit relationships among such containers and more specifically we are focusing on conferences and workshop proceedings available in open access and we exploit a semantic/conceptual analysis of the full free text content of each paper. We claim that such content-based analysis may lead us to a better understanding of the research communities’ activities and their emerging trends. In this work we present a novel method for research communities activity analysis, based on the combination of the results of a Social Network Analysis phase and a Content-Based one. The major innovative contribution of this work is the usage of knowledge-based techniques to meaningfully extract from each of the considered papers the main topics discussed by its authors.


italian research conference on digital library management systems | 2018

Bidirectional LSTM Recurrent Neural Network for Keyphrase Extraction

Marco Basaldella; Elisa Antolli; Giuseppe Serra; Carlo Tasso

To achieve state-of-the-art performance, keyphrase extraction systems rely on domain-specific knowledge and sophisticated features. In this paper, we propose a neural network architecture based on a Bidirectional Long Short-Term Memory Recurrent Neural Network that is able to detect the main topics on the input documents without the need of defining new hand-crafted features. A preliminary experimental evaluation on the well-known INSPEC dataset confirms the effectiveness of the proposed solution.


international conference on user modeling, adaptation, and personalization | 2015

Modelling the User Modelling Community (and Other Communities as Well)

Dario De Nart; Dante Degl’Innocenti; Andrea Pavan; Marco Basaldella; Carlo Tasso

Discovering and modelling research communities’ activities is a task that can lead to a more effective scientific process and support the development of new technologies. Journals and conferences already offer an implicit clusterization of researchers and research topics, and social analysis techniques based on co-authorship relations can highlight hidden relationships among researchers, however, little work has been done on the actual content of publications. We claim that a content-based analysis on the full text of accepted papers may lead to a better modelling and understanding of communities’ activities and their emerging trends. In this work we present an extensive case study of research community modelling based upon the analysis of over 450 events and 7000 papers.


italian research conference on digital library management systems | 2018

The Distiller Framework: Current State and Future Challenges

Marco Basaldella; Giuseppe Serra; Carlo Tasso

In 2015, we introduced a novel knowledge extraction framework called the Distiller Framework, with the goal of offering the research community a flexible, multilingual information extraction framework [3]. Two years later, the project has significantly evolved, by supporting more languages and many machine learning algorithms. In this paper we present the current design of the framework and some of its applications.


WWW '18 Companion Proceedings of the The Web Conference 2018 | 2018

Shut Up and Run: the Never-ending Quest for Social Fitness

Linda Anticoli; Marco Basaldella

In this paper we explore possible negative drawbacks in the use of wearable sensors, i.e., wearable devices used to detect different kinds of activity, e.g., from step and calories counting to heart rate and sleep monitoring. These technologies, which in the latter years witnessed a rapid development in terms of accuracy and diffusion, are now available on different platforms at reasonable prices and can lead to an healthier behavior in people using them. Nevertheless, we will try to investigate possibly harming behaviors related to these devices. We will provide different scenarios in which wearable sensors, in connection with social media, data mining, or other technologies, could prove harmful for their users.


recent advances in natural language processing | 2017

Exploiting and Evaluating a Supervised, Multilanguage Keyphrase Extraction pipeline for under-resourced languages.

Marco Basaldella; Muhammad Helmy; Elisa Antolli; Mihai Horia Popescu; Giuseppe Serra; Carlo Tasso

This paper evaluates different techniques for building a supervised, multilanguage keyphrase extraction pipeline for languages which lack a gold standard. Starting from an unsupervised English keyphrase extraction pipeline, we implement pipelines for Arabic, Italian, Portuguese, and Romanian, and we build test collections for languages which lack one. Then, we add a Machine Learning module trained on a well-known English language corpus and we evaluate the performance not only over English but on the other languages as well. Finally, we repeat the same evaluation after training the pipeline over an Arabic language corpus to check whether using a language-specific corpus brings a further improvement in performance. On the five languages we analyzed, results show an improvement in performance when using a machine learning algorithm, even if such algorithm is not trained and tested on the same language.


international conference on asian language processing | 2016

Towards building a standard dataset for Arabic keyphrase extraction evaluation

Muhammad Helmy; Marco Basaldella; Eddy Maddalena; Stefano Mizzaro; Gianluca Demartini

Keyphrases are short phrases that best represent a document content. They can be useful in a variety of applications, including document summarization and retrieval models. In this paper, we introduce the first dataset of keyphrases for an Arabic document collection, obtained by means of crowdsourcing. We experimentally evaluate different crowdsourced answer aggregation strategies and validate their performances against expert annotations to evaluate the quality of our dataset. We report about our experimental results, the dataset features, some lessons learned, and ideas for future work.


national conference on artificial intelligence | 2016

Crowdsourcing Relevance Assessments: The Unexpected Benefits of Limiting the Time to Judge

Eddy Maddalena; Marco Basaldella; Dario De Nart; Dante Degl'Innocenti; Stefano Mizzaro; Gianluca Demartini


international conference on computational linguistics | 2016

Evaluating anaphora and coreference resolution to improve automatic keyphrase extraction.

Marco Basaldella; Giorgia Chiaradia; Carlo Tasso

Collaboration


Dive into the Marco Basaldella's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Giuseppe Serra

University of Modena and Reggio Emilia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge