Marcelo Riss
Hewlett-Packard
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marcelo Riss.
document engineering | 2014
Rafael Ferreira; Rafael Dueire Lins; Fred Freitas; Steven J. Simske; Marcelo Riss
Sentence similarity is used to measure the degree of likelihood between sentences. It is used in many natural language applications, such as text summarization, information retrieval, text categorization, and machine translation. The current methods for assessing sentence similarity represent sentences as vectors of bag of words or the syntactic information of the words in the sentence. The degree of likelihood between phrases is calculated by composing the similarity between the words in the sentences. Two important concerns in the area, the meaning problem and the word order, are not handled, however. This paper proposes a new sentence similarity assessment measure that largely improves and refines a recently published method that takes into account the lexical, syntactic and semantic components of sentences. The new method proposed here was benchmarked using a publically available standard dataset. The results obtained show that the new similarity assessment measure proposed outperforms the state of the art systems and achieve results comparable to the evaluation made by humans.
Expert Systems With Applications | 2016
Hilário Oliveira; Rafael Ferreira; Rinaldo Lima; Rafael Dueire Lins; Fred Freitas; Marcelo Riss; Steven J. Simske
We investigate eighteen shallow sentence scoring techniques and ensemble strategies.Experiments were performed in several datasets for single- and multi-document task.Ensemble strategies lead to improvements over the individual scoring techniques.Ensembles that perform competitively against the state-of-the-art were identified. The volume of text data has been growing exponentially in the last years, mainly due to the Internet. Automatic Text Summarization has emerged as an alternative to help users find relevant information in the content of one or more documents. This paper presents a comparative analysis of eighteen shallow sentence scoring techniques to compute the importance of a sentence in the context of extractive single- and multi-document summarization. Several experiments were made to assess the performance of such techniques individually and applying different combination strategies. The most traditional benchmark on the news domain demonstrates the feasibility of combining such techniques, in most cases outperforming the results obtained by isolated techniques. Combinations that perform competitively with the state-of-the-art systems were found.
document engineering | 2015
Jamilson Batista; Rodolfo Ferreira; Hilário Tomaz; Rafael Ferreira; Rafael Dueire Lins; Steven J. Simske; Gabriel de França Pereira e Silva; Marcelo Riss
Text summarization is the process of automatically creating a shorter version of one or more text documents. This paper presents a qualitative and quantitative assessment of the 22 state-of-the-art extractive summarization systems using the CNN corpus, a dataset of 3,000 news articles.
Computer Speech & Language | 2016
Rafael Ferreira; Rafael Dueire Lins; Steven J. Simske; Fred Freitas; Marcelo Riss
Abstract The degree of similarity between sentences is assessed by sentence similarity methods. Sentence similarity methods play an important role in areas such as summarization, search, and categorization of texts, machine translation, etc. The current methods for assessing sentence similarity are based only on the similarity between the words in the sentences. Such methods either represent sentences as bag of words vectors or are restricted to the syntactic information of the sentences. Two important problems in language understanding are not addressed by such strategies: the word order and the meaning of the sentence as a whole. The new sentence similarity assessment measure presented here largely improves and refines a recently published method that takes into account the lexical, syntactic and semantic components of sentences. The new method was benchmarked using Li–McLean, showing that it outperforms the state of the art systems and achieves results comparable to the evaluation made by humans. Besides that, the method proposed was extensively tested using the SemEval 2012 sentence similarity test set and in the evaluation of the degree of similarity between summaries using the CNN-corpus. In both cases, the measure proposed here was proved effective and useful.
Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) on | 2014
Rafael Ferreira; Rafael Dueire Lins; Frederico Luiz Gonçalves de Freitas; Bruno Tenório Ávila; Steven J. Simske; Marcelo Riss
Sentence similarity methods are used to assess the degree of likelihood between phrases. Many natural language applications such as text summarization, information retrieval, text categorization, and machine translation employ measures of sentence similarity. The existing approaches for this problem represent sentences as vectors of bag of words or the syntactic information of the words in the phrase. The likelihood between phrases is calculated by composing the similarity between the words in the sentences. Such schemes do not address two important concerns in the area, however: the semantic problem and the word order. This paper proposes a new sentence similarity measure that attempts to address such problems by taking into account the lexical, syntactic, and semantic analysis of sentences. The new similarity measure proposed outperforms the state of the art systems in around 6%, when tested using a standard and publically available dataset.
document engineering | 2015
Gabriel de França Pereira e Silva; Rafael Ferreira; Rafael Dueire Lins; Luciano de Souza Cabral; Hilário Oliveira; Steven J. Simske; Marcelo Riss
The need for automatic generation of summaries gained importance with the unprecedented volume of information available in the Internet. Automatic systems based on extractive summarization techniques select the most significant sentences of one or more texts to generate a summary. This article makes use of Machine Learning techniques to assess the quality of the twenty most referenced strategies used in extractive summarization, integrating them in a tool. Quantitative and qualitative aspects were considered in such assessment demonstrating the validity of the proposed scheme. The experiments were performed on the CNN-corpus, possibly the largest and most suitable test corpus today for benchmarking extractive summarization strategies.
document engineering | 2014
Luciano de Souza Cabral; Rafael Dueire Lins; Rafael Fe Mello; Fred Freitas; Bruno Tenório Ávila; Steven J. Simske; Marcelo Riss
The text data available on the Internet is not only huge in volume, but also in diversity of subject, quality and idiom. Such factors make it infeasible to efficiently scavenge useful information from it. Automatic text summarization is a possible solution for efficiently addressing such a problem, because it aims to sieve the relevant information in documents by creating shorter versions of the text. However, most of the techniques and tools available for automatic text summarization are designed only for the English language, which is a severe restriction. There are multilingual platforms that support, at most, 2 languages. This paper proposes a language independent summarization platform that provides corpus acquisition, language classification, translation and text summarization for 25 different languages.
mexican international conference on artificial intelligence | 2015
Luciano de Souza Cabral; Rinaldo Lima; Rafael Dueire Lins; Manoel Neto; Rafael Ferreira; Steven J. Simske; Marcelo Riss
Smartphones and tablets provide access to the Web anywhere and anytime. Automatic Text Summarization techniques aim to extract the fundamental information in documents. Making automatic summarization work in portable devices is a challenge, in several aspects. This paper presents an automatic summarization application for Android devices. The proposed solution is a multi-feature language independent summarization application targeted at news articles. Several evaluation assessments were conducted and indicate that the proposed solution provides good results.
Computer Speech & Language | 2018
Rafael Ferreira; George Darmiton da Cunha Cavalcanti; Fred Freitas; Rafael Dueire Lins; Steven J. Simske; Marcelo Riss
It proposes a new paraphrase identification system based on lexical, syntactic, semantic analysis.It uses different machine learning algorithms to classify the paraphrase.The measure was evaluated using state-of-art dataset: Microsoft Paraphrase Corpus. Paraphrase identification consists in the process of verifying if two sentences are semantically equivalent or not. It is applied in many natural language tasks, such as text summarization, information retrieval, text categorization, and machine translation. In general, methods for assessing paraphrase identification perform three steps. First, they represent sentences as vectors using bag of words or syntactic information of the words present the sentence. Next, this representation is used to measure different similarities between two sentences. In the third step, these similarities are given as input to a machine learning algorithm that classifies these two sentences as paraphrase or not. However, two important problems in the area of paraphrase identification are not handled: (i) the meaning problem: two sentences sharing the same meaning, composed of different words; and (ii) the word order problem: the order of the words in the sentences may change the meaning of the text. This paper proposes a paraphrase identification system that represents each pair of sentence as a combination of different similarity measures. These measures extract lexical, syntactic and semantic components of the sentences encompassed in a graph. The proposed method was benchmarked using the Microsoft Paraphrase Corpus, which is the publicly available standard dataset for the task. Different machine learning algorithms were applied to classify a sentence pair as paraphrase or not. The results show that the proposed method outperforms state-of-the-art systems.
document engineering | 2014
Rinaldo Lima; Jamilson Batista; Rafael Ferreira; Frederico Luiz Gonçalves de Freitas; Rafael Dueire Lins; Steven J. Simske; Marcelo Riss
Relation extraction (RE) aims at finding the way entities, such as person, location, organization, date, etc., depend upon each other in a text document. Ontology Population, Automatic Summarization, and Question Answering are fields in which relation extraction offers valuable solutions. A relation extraction method based on inductive logic programming that induces extraction rules suitable to identify semantic relations between entities was proposed by the authors in a previous work. This paper proposes a method to simplify graph-based representations of sentences that replaces dependency graphs of sentences by simpler ones, keeping the target entities in it. The goal is to speed up the learning phase in a RE framework, by applying several rules for graph simplification that constrain the hypothesis space for generating extraction rules. Moreover, the direct impact on the extraction performance results is also investigated. The proposed techniques outperformed some other state-of-the-art systems when assessed on two standard datasets for relation extraction in the biomedical domain.