Davide Buscaldi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Davide Buscaldi is active.

Explore More

Publication

Featured researches published by Davide Buscaldi.

international world wide web conferences | 2015

Sentiment Analysis on Microblogs for Natural Disasters Management: a Study on the 2014 Genoa Floodings

Davide Buscaldi; Irazú Hernandez-Farias

People use social networks for different communication purposes, for example to share their opinion on ongoing events. One way to exploit this common knowledge is by using Sentiment Analysis and Natural Language Processing in order to extract useful information. In this paper we present a SA approach applied to a set of tweets related to a recent natural disaster in Italy; our goal is to identify tweets that may provide useful information from a disaster management perspective.

exploiting semantic annotations in information retrieval | 2013

YaSemIR: yet another semantic information retrieval system

Davide Buscaldi; Haïfa Zargayouna

In this paper we present YaSemIR, a free open-source Semantic Information Retrieval system based on Lucene. It takes one or more ontologies in OWL format and a terminology associated to each ontology in SKOS format to index semantically a text collection. The terminology is used to annotate concepts in documents, while the ontology is used to exploit the taxonomic information in order to expand these with their subsumers. YaSemIR is a flexible system that may be configured to work with different ontologies, on various types of documents.

advances in databases and information systems | 2014

Using the Semantics of Texts for Information Retrieval: A Concept- and Domain Relation-Based Approach

Davide Buscaldi; Marie-Noëlle Bessagnet; Albert Royer; Christian Sallaberry

Our hypothesis is that assessing the relevance of a document with respect to a query is equivalent to assessing the conceptual similarity between the terms of the query and those of the document. In this article, we therefore propose a method of calculating conceptual similarity. Our information retrieval strategy is based on exploring an ontology and domain relations between concepts marked by verbal forms. Our approach overall is implemented by a prototype and the results obtained are evaluated. We thus show that a semantic IR system based on concepts improves recall with respect to a classic IR system and that a semantic IR system based on concepts and domain relations improves precision with respect to IR based on concepts alone.

north american chapter of the association for computational linguistics | 2015

QASSIT: A Pretopological Framework for the Automatic Construction of Lexical Taxonomies from Raw Texts

Guillaume Cleuziou; Davide Buscaldi; Gaël Dias; Vincent Levorato; Christine Largeron

This paper presents our participation to the SemEval Task-17, related to “Taxonomy Extraction Evaluation” (Bordea et al., 2015). We propose a new methodology for semi-supervised and auto-supervised acquisition of lexical taxonomies from raw texts. Our approach is based on the theory of pretopology that offers a powerful formalism to model subsumption relations and transforms a list of terms into a structured term space by combining different discriminant criteria. In order to reach a good pretopological space, we define the Learning Pretopological Spaces method that learns a parameterized space by using an nevolutionary strategy.

international conference on neural information processing | 2015

Correlating Open Rating Systems and Event Extraction from Text

Ehab Hassan; Davide Buscaldi; Aldo Gangemi

Event extraction is a very important task for research textual information. This task can be applied to various types of written text, e.g. news messages, blogs, manuscripts, and user reviews for products or services. In this paper, we report results about an experiment in correlating event patterns obtained from machine reading, and ranking derived from open rating systems. The experiment is performed in the touristic domain, where there is some evidence of misalignment between the two sources of opinion.

International Workshop on Semantic, Analytics, Visualization | 2016

A Typology of Semantic Relations Dedicated to Scientific Literature Analysis

Kata Gábor; Haïfa Zargayouna; Isabelle Tellier; Davide Buscaldi; Thierry Charnois

We propose a method for improving access to scientific literature by analyzing the content of research papers beyond citation links and topic tracking. Our model relies on a typology of explicit semantic relations. These relations are instantiated in the abstract/introduction part of the papers and can be identified automatically using textual data and external ontologies. Preliminary results show a promising precision in unsupervised relationship classification.

north american chapter of the association for computational linguistics | 2015

SOPA: Random Forests Regression for the Semantic Textual Similarity task

Davide Buscaldi; Jorge J. García Flores; Ivan Meza; Isaac Rodriguez

This paper describes the system used by the LIPN-IIMAS team in the Task 2, Semantic Textual Similarity, at SemEval 2015, in both the English and Spanish sub-tasks. We included some features based on alignment measures and we tested different learning models, in particular Random Forests, which proved the best among those used in our participation.

Proceedings of the 5th Spanish Conference on Information Retrieval | 2018

Improving access to scientific literature: a semantic IR perspective

Davide Buscaldi

Nowadays, the flow of data and publications in almost every field of research is continuously growing. Some estimates place the growth rate in the number of scientific publications between 2.2% and 14% per year, depending on the type and the domain of the publication [6]. This data deluge presents a bottleneck for scientific progress and a challenge for existing search engines. The problems to be solved are some old ones: the ambiguity of a concept, especially among different research fields (for instance, lattice in computer science vs. physics), and the synonymy (or quasi-synonymy) of concepts that are expressed in different ways: for instance, opinion mining and sentiment analysis. These issues may affect various tasks: a researcher building a state of the art for a specific topic, an editor finding reviewers for a given paper, or a government official studying a project proposal, among others. The need to go beyond the mere document retrieval in the context of scientific literature is corroborated by the proliferation of related projects and works, and the organization of new shared tasks, in particular the ScienceIE task at SemEval-2017, focused on the identification of keyphrases representing topics, methods, data and tools [1], and task-7 at Semeval-2018 about semantic relation extraction and classification in scientific papers [3]. Some recent works address the problem with the help of structured lists of known keywords, such as Rexplore [7], which integrates statistical analysis with semantic technologies, or by analyzing the citation network among various papers, such as in CiteSpace [2]. In most cases, the relevance, or impact, of a paper is assessed by the number of citations it receives. However, Oren Etzioni1 observed that Academics may cite papers for non-essential reasons - out of courtesy, for completeness or to promote their own publications. These superfluous citations can impede literature searches and exaggerate a papers importance and therefore it is necessary to use Artificial Intelligence to discover the meaning and the importance of a specific citation. Recently, at LIPN we started working on the access to scientific information from a semantic information retrieval perspective, therefore leveraging the use of ontologies and similar semantic resources for this task. The first step has been to build a typology of semantic relations [4] that are often used in state of the art sections of scientific paper. Some of these relations link methods and the problems they solve, others link a resource and a system that used it. This typology can evolve or be integrated into more complex ontologies. The next step was to verify whether it is possible to detect these relations automatically. We focused on unsupervised methods that exploit the information coming from keywords and patterns around the entities that are connected by the relations, and tested the possibility to improve these results using semantic embeddings [5]. We produced a set of annotated documents that were used for task-7 at SemEval 2018, where various participants showed the effectiveness of Deep Neural Networks (DNN) methods to detect and classify the relations [3]. The results show that these methods are usually able to predict with a high accuracy (85 - 90%) the type of a relation, if they are fed the information about the linked entities, but there is still a lot of work to be done for the detection of the relations (~ 50% for the best system).

north american chapter of the association for computational linguistics | 2016

LIPN-IIMAS at SemEval-2016 Task 1: Random Forest Regression Experiments on Align-and-Differentiate and Word Embeddings penalizing strategies.

Oscar William Lightgow Serrano; Ivan Vladimir Meza Ruiz; Albert Manuel Orozco Camacho; Jorge J. García Flores; Davide Buscaldi

This paper describes the SOPA-N system used by the LIPN-IIMAS team in Semeval 2016 Semantic Textual Similarity (Task 1). We based our work on the SOPA 2015 system. The SOPA-2015 system used 16 similarity features (including Wordnet, Information Retrieval and Syntactic Dependencies) within a Random Forest learning model. We expanded this system with an Align and Differentiate based strategy, word embeddings and penalization, which showed 6.8% of improvement on the development set. However, we found that on the evaluation data for the 2016 STS shared task, the 2015 system outperformed our newer systems.

knowledge acquisition, modeling and management | 2016

Event-Based Recognition of Lived Experiences in User Reviews

Ehab Hassan; Davide Buscaldi; Aldo Gangemi

User reviews on the web are an important source of opinions on products and services. For a popular product or service, the number of reviews can be large. Therefore, it may be difficult for a potential customer to read all of them and make a decision. We hypothesize and test if lived experiences from reviews may support the confidence of a user in a review. We identify and extract such lived experiences with a novel technique based on machine reading. Our experimental results demonstrate the effectiveness of the technique.

Explore More