Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jorge Vivaldi is active.

Publication


Featured researches published by Jorge Vivaldi.


mexican international conference on artificial intelligence | 2007

A new hybrid summarizer based on vector space model, statistical physics and linguistics

Iria da Cunha; Silvia Fernández; Patricia Velázquez Morales; Jorge Vivaldi; Eric SanJuan; Juan-Manuel Torres-Moreno

In this article we present a hybrid approach for automatic summarization of Spanish medical texts. There are a lot of systems for automatic summarization using statistics or linguistics, but only a few of them combining both techniques. Our idea is that to reach a good summary we need to use linguistic aspects of texts, but as well we should benefit of the advantages of statistical techniques. We have integrated the Cortex (Vector Space Model) and Enertex (statistical physics) systems coupled with the Yate term extractor, and the Disicosum system (linguistics). We have compared these systems and afterwards we have integrated them in a hybrid approach. Finally, we have applied this hybrid system over a corpora of medical articles and we have evaluated their performances obtaining good results.


conference on intelligent text processing and computational linguistics | 2015

Medical Entities Tagging Using Distant Learning

Jorge Vivaldi; Horacio Rodríguez

A semantic tagger aiming to detect relevant entities in medical documents and tagging them with their appropriate semantic class is presented. In the experiments described in this paper the tagset consists of the six most frequent classes in SNOMED-CT taxonomy (SN). The system uses six binary classifiers, and two combination mechanisms are presented for combining the results of the binary classifiers. Learning the classifiers is performed using three widely used knowledge sources, including one domain restricted and two domain independent resources. The system obtains state-of-the-art results.


INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval | 2010

The REG summarization system with question reformulation at QA@INEX track 2010

Jorge Vivaldi; Iria da Cunha; Javier Ramírez

In this paper we present REG, a graph approach to study a fundamental problem of Natural Language Processing: the automatic summarization of documents. The algorithm models a document as a graph, to obtain weighted sentences. We applied this approach to the INEX@QA 2010 task (question-answering). To do it, we have extracted the terms and name entities from the queries, in order to obtain a list of terms and name entities related with the main topic of the question. Using this strategy, REG obtained good results regarding performance (measured with the automatic evaluation system FRESA) and readability (measured with human evaluation), being one of the seven best systems into the task.


International Workshop of the Initiative for the Evaluation of XML Retrieval | 2011

QA@INEX Track 2011: Question Expansion and Reformulation Using the REG Summarization System

Jorge Vivaldi; Iria da Cunha

In this paper, our strategy and results for the INEX@QA 2011 question-answering task are presented. In this task, a set of 50 documents is provided by the search engine Indri, using some queries. The initial queries are titles associated with tweets. Reformulation of these queries is carried out using terminological and named entities information. To design the queries, the full process is divided into 2 steps: a) both titles and tweets are POS tagged, and b) queries are expanded or reformulated, using: terms and named entities included in the title, terms and named entities found in the tweet related to those ones, and Wikipedia redirected terms and named entities from those ones included in the title. In our work, the automatic summarization system REG is used to summarize the 50 documents obtained with these queries. The algorithm models a document as a graph to obtain weighted sentences. A single document is generated and it is considered the answer of the query. This strategy, combining summarization and question reformulation, obtains good results regarding informativeness and readability.


Languages for specific purposes in the digital era, 2013, ISBN 9783319022215, págs. 223-242 | 2014

Automatic specialized vs. non-specialized text differentiation: the usability of grammatical features in a latin multilingual context

M. Teresa Cabré; Iria da Cunha; Eric SanJuan; Juan-Manuel Torres-Moreno; Jorge Vivaldi

In this chapter it is shown that certain grammatical features, besides lexicon, have a strong potential to differentiate specialized texts from non-specialized texts. A tool including these features has been developed and it has been trained using machine learning techniques based on association rules using two sub-corpora (specialized vs. non-specialized), each one divided into training and test corpora. This tool has been evaluated and the results show that the used strategy is suitable to differentiate specialized texts from non-specialized texts. These results could be considered as an innovative perspective to research on domains related with terminology, specialized discourse and computational linguistics, with applications to automatic compilation of Languages for Specific Purposes (LSP) corpora and Adaptive Focused Information Retrieval (AFIR) among others.


international conference on computational linguistics | 2011

Automatic specialized vs. non-specialized sentence differentiation

Iria da Cunha; M. Teresa Cabré; Eric SanJuan; Gerardo Sierra; Juan-Manuel Torres-Moreno; Jorge Vivaldi

Compilation of Languages for Specific Purposes (LSP) corpora is a task which is fraught with several difficulties (mainly time and human effort), because it is not easy to discern between specialized and non-specialized text. The aim of this work is to study automatic specialized vs. non-specialized sentence differentiation. The experiments are carried out on two corpora of sentences extracted from specialized and non-specialized texts. One in economics (academic publications and news from newspapers), another about sexuality (academic publications and texts from forums and blogs). First we show the feasibility of the task using a statistical n-gram classifier. Then we show that grammatical features can also be used to classify sentences from the first corpus. For such purpose we use association rule mining.


ieee international colloquium on information science and technology | 2014

Arabic medical terms compilation from Wikipedia

Jorge Vivaldi; Horacio Rodríguez

Domain terms are a useful mean for tuning both resources and NLP processors to domain specific tasks. This paper proposes an improved method for obtaining terms from potentially any domain using the Wikipedia graph structure as a knowledge source.


Terminology | 2007

Evaluation of terms and term extraction systems: A practical approach

Jorge Vivaldi; Horacio Rodríguez


Terminology | 2001

Improving term extraction by combining different techniques

Jorge Vivaldi; Horacio Rodríguez


Procesamiento Del Lenguaje Natural | 2010

Using Wikipedia for term extraction in the biomedical domain: first experiences

Jorge Vivaldi; Horacio Rodríguez

Collaboration


Dive into the Jorge Vivaldi's collaboration.

Top Co-Authors

Avatar

Horacio Rodríguez

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Núria Bel

Pompeu Fabra University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Juan-Manuel Torres-Moreno

École Polytechnique de Montréal

View shared research outputs
Top Co-Authors

Avatar

Leo Wanner

Pompeu Fabra University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gerardo Sierra

National Autonomous University of Mexico

View shared research outputs
Top Co-Authors

Avatar

Judit Feliu

Pompeu Fabra University

View shared research outputs
Researchain Logo
Decentralizing Knowledge