Is this you? Create Your Porfile

Rinaldo Lima

Federal University of Pernambuco

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rinaldo Lima is active.

Explore More

Publication

Featured researches published by Rinaldo Lima.

Expert Systems With Applications | 2013

Assessing sentence scoring techniques for extractive text summarization

Rafael Ferreira; Luciano de Souza Cabral; Rafael Dueire Lins; Gabriel de França Pereira e Silva; Fred Freitas; George D. C. Cavalcanti; Rinaldo Lima; Steven J. Simske; Luciano Favaro

Abstract Text summarization is the process of automatically creating a shorter version of one or more text documents. It is an important way of finding relevant information in large text libraries or in the Internet. Essentially, text summarization techniques are classified as Extractive and Abstractive. Extractive techniques perform text summarization by selecting sentences of documents according to some criteria. Abstractive summaries attempt to improve the coherence among sentences by eliminating redundancies and clarifying the contest of sentences. In terms of extractive summarization, sentence scoring is the technique most used for extractive text summarization. This paper describes and performs a quantitative and qualitative assessment of 15 algorithms for sentence scoring available in the literature. Three different datasets (News, Blogs and Article contexts) were evaluated. In addition, directions to improve the sentence extraction results obtained are suggested.

document analysis systems | 2014

A Context Based Text Summarization System

Rafael Ferreira; Frederico Luiz Gonçalves de Freitas; Luciano de Souza Cabral; Rafael Dueire Lins; Rinaldo Lima; Gabriel Franca; Steven J. Simske; Luciano Favaro

Text summarization is the process of creating a shorter version of one or more text documents. Automatic text summarization has become an important way of finding relevant information in large text libraries or in the Internet. Extractive text summarization techniques select entire sentences from documents according to some criteria to form a summary. Sentence scoring is the technique most used for extractive text summarization, today. Depending on the context, however, some techniques may yield better results than some others. This paper advocates the thesis that the quality of the summary obtained with combinations of sentence scoring methods depend on text subject. Such hypothesis is evaluated using three different contexts: news, blogs and articles. The results obtained show the validity of the hypothesis formulated and point at which techniques are more effective in each of those contexts studied.

Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) on | 2013

A Four Dimension Graph Model for Automatic Text Summarization

Rafael Ferreira; Frederico Luiz Gonçalves de Freitas; Luciano de Souza Cabral; Rafael Dueire Lins; Rinaldo Lima; Gabriel Franca; Steven J. Simskez; Luciano Favaro

Text summarization is the process of automatically creating a shorter version of one or more text documents. In this context, word-based, sentence-based and graph-based methods approaches are largely used. Among these, graph based methods for automatic text summarization produce summaries based on the relationships between sentences. These relationships may also support the creation of several text processing applications such as extractive and abstractive summaries, question-answering and information retrieval systems, among others. A new graph model for text processing applications is proposed in this paper. It relies on four dimensions (similarity, semantic similarity, co reference, discourse information) to create the graph. The rationale behind the proposal presented here is resorting to more dimensions than previous works, and taking into account co reference resolution, taking into account to the role of pronouns in connecting the sentences. Co reference was not used in any previous graph based summarization technique. An experiment was performed using the Text Rank algorithm with the presented approach, on the CNN corpus. The results show that the model proposed here outperforms the current approaches both quantitatively and qualitatively.

Expert Systems With Applications | 2013

RetriBlog: An architecture-centered framework for developing blog crawlers

Rafael Ferreira; Fred Freitas; Patrick H. S. Brito; Jean Melo; Rinaldo Lima; Evandro Costa

Blogs have become an important social tool. It allows the users to share their tastes, express their opinions, report news, form groups related to some subject, among others. The information obtained from the blogosphere may be used to create several applications in various fields. However, due to the growing number of blogs posted every day, as well as the dynamicity of the blogosphere, the task of extracting relevant information from the blogs has become difficult and time consuming. In this paper, we use information retrieval and extraction techniques to deal with this problem. Furthermore, as blogs have many variation points is required to provide applications that can be easily adapted. Faced with this scenario, the work proposes RetriBlog, an architecture-centered framework for the development of blog crawlers. Finally, it presents an evaluation of the proposed algorithms and three case studies.

international conference on tools with artificial intelligence | 2013

Information Extraction from the Web: An Ontology-Based Method Using Inductive Logic Programming

Rinaldo Lima; Hilário Oliveira; Fred Freitas; Bernard Espinasse; Laura Pentagrossa

Relevant information extraction from text and web pages in particular is an intensive and time-consuming task that needs important semantic resources. Thus, to be efficient, automatic information extraction systems have to exploit semantic resources (or ontologies) and employ machine-learning techniques to make them more adaptive. This paper presents an Ontology-based Information Extraction method using Inductive Logic Programming that allows inducing symbolic predicates expressed in Horn clausal logic that subsume information extraction rules. Such rules allow the system to extract class and relation instances from English corpora for ontology population purposes. Several experiments were conducted and preliminary experimental results are promising, showing that the proposed approach improves previous work over extracting instances of classes and relations, either separately or altogether.

ibero-american conference on artificial intelligence | 2012

An Unsupervised Method for Ontology Population from the Web

Hilário Tomaz; Rinaldo Lima; João Emanoel; Fred Freitas

Knowledge engineers have had difficulty in automatically constructing and populating domain ontologies, mainly due to the well-known knowledge acquisition bottleneck. In this paper, we attempt to alleviate this problem by proposing an iterative unsupervised approach to identifying and extracting ontological class instances from the Web. The proposed approach considers the Web as a big corpus and relies on a confidence-weighted metric based on semantic measures and web-scale statistics as types of evidence. Moreover, our iterative method is able to learn, to some extent, domain-specific linguistic patterns for extracting ontological class instances. We obtained encouraging results for the final ranking of candidate instances as well as an accuracy performance up to 97% for the patterns found by our method.

database and expert systems applications | 2012

A Confidence–Weighted Metric for Unsupervised Ontology Population from Web Texts

Hilário Oliveira; Rinaldo Lima; João Gomes; Rafael Ferreira; Frederico Luiz Gonçalves de Freitas; Evandro Costa

Knowledge engineers have had difficulty in automatically constructing and populating domain ontologies, mainly due to the well-known knowledge acquisition bottleneck. In this paper, we attempt to alleviate this problem by proposing an unsupervised approach for extracting class instances using the web as a big corpus and exploring linguistic patterns to identify and extract ontological class instances. The prototype implementation uses shallow syntactic parsing for disambiguation issues. In addition, we propose a confidence-weighted metric based on different versions of the classical PMI metric, WordNet similarity measures, and heuristics to calculate the final confidence score that can altogether improve the ranking of candidate instances retrieved by the system. We conducted preliminary experiments comparing the proposed confidence metric against some versions of the PMI metric. We obtained promising results for the final ranking of the candidate instances, achieving a gain in precision up to 24%.

mexican international conference on artificial intelligence | 2015

Automatic Summarization of News Articles in Mobile Devices

Luciano de Souza Cabral; Rinaldo Lima; Rafael Dueire Lins; Manoel Neto; Rafael Ferreira; Steven J. Simske; Marcelo Riss

Smartphones and tablets provide access to the Web anywhere and anytime. Automatic Text Summarization techniques aim to extract the fundamental information in documents. Making automatic summarization work in portable devices is a challenge, in several aspects. This paper presents an automatic summarization application for Android devices. The proposed solution is a multi-feature language independent summarization application targeted at news articles. Several evaluation assessments were conducted and indicate that the proposed solution provides good results.

acm symposium on applied computing | 2010

An adaptive information extraction system based on wrapper induction with POS tagging

Rinaldo Lima; Bernard Espinasse; Frederico Luiz Gonçalves de Freitas

Information Extraction (IE) performs two important tasks: identifying certain pieces of information from documents and storing them for future use. This work proposes an adaptive IE system based on Boosted Wrapper Induction (BWI), a supervised wrapper induction algorithm. However, some authors have shown that boosting techniques face difficulties during the processing of natural language texts. This fact became the rationale for coupling Parts-of-Speech tagging with the BWI algorithm in our proposed system. In order to evaluate its performance, several experiments were carried out on three standard corpora. The results obtained suggest that the union of POS tagging and BWI offers a small gain of 3--5% of performance over the original BWI algorithm for unstructured texts. These results position our system among the very best similar IE systems endowed with POS tagging, according to a comparison presented and discussed in the article.

international conference on tools with artificial intelligence | 2015

Relation Extraction from Texts with Symbolic Rules Induced by Inductive Logic Programming

Rinaldo Lima; Bernard Espinasse; Fred Freitas

Relation Extraction (RE) is the task of detecting semantic relations between entities in text. Most of the state-of-the-art RE systems rely on statistical machine learning techniques which usually employ an attribute-value representation of features. Contrarily to this trend, we focus on an alternative approach to RE based on the automatic induction of symbolic extraction rules. We present OntoILPER, an RE system based on Inductive Logic Programming which uses a domain ontology in its extraction process. Several experiments are discussed in this paper over the reACE 2004/2005 reference corpora. The results are encouraging and seem to demonstrate the effective-ness of the proposed solution.

Explore More