Thierson Couto Rosa | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thierson Couto Rosa is active.

Explore More

Publication

Featured researches published by Thierson Couto Rosa.

International Journal of Medical Informatics | 2016

SentiHealth-Cancer: A sentiment analysis tool to help detecting mood of patients in online social networks

Ramon Gouveia Rodrigues; Rafael Marques das Dores; Celso G. Camilo-Junior; Thierson Couto Rosa

BACKGROUND Cancer is a critical disease that affects millions of people and families around the world. In 2012 about 14.1 million new cases of cancer occurred globally. Because of many reasons like the severity of some cases, the side effects of some treatments and death of other patients, cancer patients tend to be affected by serious emotional disorders, like depression, for instance. Thus, monitoring the mood of the patients is an important part of their treatment. Many cancer patients are users of online social networks and many of them take part in cancer virtual communities where they exchange messages commenting about their treatment or giving support to other patients in the community. Most of these communities are of public access and thus are useful sources of information about the mood of patients. Based on that, Sentiment Analysis methods can be useful to automatically detect positive or negative mood of cancer patients by analyzing their messages in these online communities. OBJECTIVE The objective of this work is to present a Sentiment Analysis tool, named SentiHealth-Cancer (SHC-pt), that improves the detection of emotional state of patients in Brazilian online cancer communities, by inspecting their posts written in Portuguese language. The SHC-pt is a sentiment analysis tool which is tailored specifically to detect positive, negative or neutral messages of patients in online communities of cancer patients. We conducted a comparative study of the proposed method with a set of general-purpose sentiment analysis tools adapted to this context. METHODS Different collections of posts were obtained from two cancer communities in Facebook. Additionally, the posts were analyzed by sentiment analysis tools that support the Portuguese language (Semantria and SentiStrength) and by the tool SHC-pt, developed based on the method proposed in this paper called SentiHealth. Moreover, as a second alternative to analyze the texts in Portuguese, the collected texts were automatically translated into English, and submitted to sentiment analysis tools that do not support the Portuguese language (AlchemyAPI and Textalytics) and also to Semantria and SentiStrength, using the English option of these tools. Six experiments were conducted with some variations and different origins of the collected posts. The results were measured using the following metrics: precision, recall, F1-measure and accuracy RESULTS The proposed tool SHC-pt reached the best averages for accuracy and F1-measure (harmonic mean between recall and precision) in the three sentiment classes addressed (positive, negative and neutral) in all experimental settings. Moreover, the worst accuracy value (58%) achieved by SHC-pt in any experiment is 11.53% better than the greatest accuracy (52%) presented by other addressed tools. Finally, the worst average F1 (48.46%) reached by SHC-pt in any experiment is 4.14% better than the greatest average F1 (46.53%) achieved by other addressed tools. Thus, even when we compare the SHC-pt results in complex scenario versus others in easier scenario the SHC-pt is better. CONCLUSIONS This paper presents two contributions. First, it proposes the method SentiHealth to detect the mood of cancer patients that are also users of communities of patients in online social networks. Second, it presents an instantiated tool from the method, called SentiHealth-Cancer (SHC-pt), dedicated to automatically analyze posts in communities of cancer patients, based on SentiHealth. This context-tailored tool outperformed other general-purpose sentiment analysis tools at least in the cancer context. This suggests that the SentiHealth method could be instantiated as other disease-based tools during future works, for instance SentiHealth-HIV, SentiHealth-Stroke and SentiHealth-Sclerosis.

congress on evolutionary computation | 2014

An evolutionary approach for combining results of recommender systems techniques based on Collaborative Filtering

Edjalma Q. da Silva; Celso G. Camilo; Luiz Mario L. Pascoal; Thierson Couto Rosa

Recommendation systems work as a counselor, behaving in such a way to guide people in the discovery of products of interest. There are various techniques and approaches in the literature that enable generating recommendations. This is interesting because it emphasizes the diversity of options; on the other hand, it can cause doubt to the system designer about which is the best technique to use. Each of these approaches has particularities and depends on the context to be applied. Thus, the decision to choose among techniques become complex to be done manually. This article proposes an evolutionary approach for combining results of recommendation techniques in order to automate the choice of techniques and get fewer errors in recommendations. To evaluate the proposal, experiments were performed with a dataset from MovieLens and some of Collaborative Filtering techniques. The results show that the combining methodology proposed in this paper performs better than any one of collaborative filtering technique separately in the context addressed. The improvement varies from 9.02% to 48.21% depending on the technique and the experiment executed.

international acm sigir conference on research and development in information retrieval | 2015

An Efficient and Scalable MetaFeature-based Document Classification Approach based on Massively Parallel Computing

Sérgio D. Canuto; Marcos André Gonçalves; Wisllay M. V. dos Santos; Thierson Couto Rosa; Wellington Santos Martins

The unprecedented growth of available data nowadays has stimulated the development of new methods for organizing and extracting useful knowledge from this immense amount of data. Automatic Document Classification (ADC) is one of such methods, that uses machine learning techniques to build models capable of automatically associating documents to well-defined semantic classes. ADC is the basis of many important applications such as language identification, sentiment analysis, recommender systems, spam filtering, among others. Recently, the use of meta-features has been shown to substantially improve the effectiveness of ADC algorithms. In particular, the use of meta-features that make a combined use of local information (through kNN-based features) and global information (through category centroids) has produced promising results. However, the generation of these meta-features is very costly in terms of both, memory consumption and runtime since there is the need to constantly call the kNN algorithm. We take advantage of the current manycore GPU architecture and present a massively parallel version of the kNN algorithm for highly dimensional and sparse datasets (which is the case for ADC). Our experimental results show that we can obtain speedup gains of up to 15x while reducing memory consumption in more than 5000x when compared to a state-of-the-art parallel baseline. This opens up the possibility of applying meta-features based classification in large collections of documents, that would otherwise take too much time or require the use of an expensive computational platform.

conference on information and knowledge management | 2014

On Efficient Meta-Level Features for Effective Text Classification

Sérgio D. Canuto; Thiago Salles; Marcos André Gonçalves; Leonardo C. da Rocha; Gabriel Ramos; Luiz Alberto Oliveira Gonçalves; Thierson Couto Rosa; Wellington Santos Martins

This paper addresses the problem of automatically learning to classify texts by exploiting information derived from meta-level features (i.e., features derived from the original bag-of-words representation). We propose new meta-level features derived from the class distribution, the entropy and the within-class cohesion observed in the k nearest neighbors of a given test document x, as well as from the distribution of distances of x to these neighbors. The set of proposed features is capable of transforming the original feature space into a new one, potentially smaller and more informed. Experiments performed with several standard datasets demonstrate that the effectiveness of the proposed meta-level features is not only much superior than the traditional bag-of-word representation but also superior to other state-of-art meta-level features previously proposed in the literature. Moreover, the proposed meta-features can be computed about three times faster than the existing meta-level ones, making our proposal much more scalable. We also demonstrate that the combination of our meta features and the original set of features produce significant improvements when compared to each feature set used in isolation.

web information systems engineering | 2012

Improving on-demand learning to rank through parallelism

Daniel Xavier de Sousa; Thierson Couto Rosa; Wellington Santos Martins; Rodrigo M. Silva; Marcos André Gonçalves

Traditional Learning to Rank (L2R) is usually conducted in a batch mode in which a single ranking function is learned to order results for future queries. This approach is not flexible since future queries may differ considerably from those present in the training set and, consequently, the learned function may not work properly. Ideally, a distinct learning function should be learned on demand for each query. Nevertheless, on-demand L2R may significantly degrade the query processing time, as the ranking function has to be learned on-the-fly before it can be applied. In this paper we present a parallel implementation of an on-demand L2R technique that reduces drastically the response time of previous serial implementation. Our implementation makes use of thousands of threads of a GPU to learn a ranking function for each query, and takes advantage of a reduced training set obtained through active learning. Experiments with the LETOR benchmark show that our proposed approach achieves a mean speedup of 127x in query processing time when compared to the sequential version, while producing very competitive ranking effectiveness.

congress on evolutionary computation | 2014

A social-evolutionary approach to compose a similarity function used on event recommendation

Luiz Mario L. Pascoal; Celso G. Camilo; Edjalma Q. da Silva; Thierson Couto Rosa

With the development of web 2.0, social networks have achieved great space on the internet, with that many users provide information and interests about themselves. There are expert systems that use the users interests to recommend different products, these systems are known as Recommender Systems. One of the main techniques of a Recommender Systems is the Collaborative Filtering (User based) which recommends products to users based on what other similar people liked in the past. However, the methods to determine similarity between users have presented some problems. Therefore, this work presents a proposal of using social variables in the composition of the similarity function applied to a user on the recommendation of events. To test the proposal, details of friends and events of two target-users of the social network Facebook have been extracted. The results were compared with different deterministic heuristics, the Euclidean Distance and a aleatory method. The proposed model showed promising results and great potential to expand to different contexts.

BMC Bioinformatics | 2013

SUNPLIN: Simulation with Uncertainty for Phylogenetic Investigations

Wellington Santos Martins; Welton Couto Carmo; Thierson Couto Rosa; Thiago Fernando Rangel

BackgroundPhylogenetic comparative analyses usually rely on a single consensus phylogenetic tree in order to study evolutionary processes. However, most phylogenetic trees are incomplete with regard to species sampling, which may critically compromise analyses. Some approaches have been proposed to integrate non-molecular phylogenetic information into incomplete molecular phylogenies. An expanded tree approach consists of adding missing species to random locations within their clade. The information contained in the topology of the resulting expanded trees can be captured by the pairwise phylogenetic distance between species and stored in a matrix for further statistical analysis. Thus, the random expansion and processing of multiple phylogenetic trees can be used to estimate the phylogenetic uncertainty through a simulation procedure. Because of the computational burden required, unless this procedure is efficiently implemented, the analyses are of limited applicability.ResultsIn this paper, we present efficient algorithms and implementations for randomly expanding and processing phylogenetic trees so that simulations involved in comparative phylogenetic analysis with uncertainty can be conducted in a reasonable time. We propose algorithms for both randomly expanding trees and calculating distance matrices. We made available the source code, which was written in the C++ language. The code may be used as a standalone program or as a shared object in the R system. The software can also be used as a web service through the link: http://purl.oclc.org/NET/sunplin/.ConclusionWe compare our implementations to similar solutions and show that significant performance gains can be obtained. Our results open up the possibility of accounting for phylogenetic uncertainty in evolutionary and ecological analyses of large datasets.

Concurrency and Computation: Practice and Experience | 2018

Parallel rule-based selective sampling and on-demand learning to rank: Parallel Rule-based Selective Sampling and On-Demand Learning to Rank

Mateus Ferreira e Freitas; Daniel Xavier de Sousa; Wellington Santos Martins; Thierson Couto Rosa; Rodrigo M. Silva; Marcos A. Gonçalves

Learning to rank (L2R) works by constructing a ranking model from training data so that, given a new query, the model is able to generate an effective rank of the objects for the query. Almost all work in L2R focus on ranking accuracy leaving performance and scalability overlooked. However, performance is a critical factor, especially when dealing with on‐demand queries. In this scenario, Learning to Rank using association rules has been shown to be extremely effective but only at a high computational cost. In this work, we show how to exploit parallelism on rule‐based systems to: i) drastically reduce L2R training datasets using selective sampling and ii) to generate query customized ranking models on the fly. We present parallel algorithms and GPU implementations for these two tasks showing that dataset reduction takes only a few seconds with speedups up to 148x over a serial baseline, and that queries can be processed in only a few milliseconds with speedups of 1000x over a serial baseline and 29x over a parallel baseline for the best case. We also extend the implementations to work with multiple GPUs, further increasing the speedup over the baselines and showing the scalability of our proposed algorithms.

conference on information and knowledge management | 2016

Incorporating Risk-Sensitiveness into Feature Selection for Learning to Rank

Daniel Xavier de Sousa; Sérgio D. Canuto; Thierson Couto Rosa; Wellington Santos Martins; Marcos André Gonçalves

Learning to Rank (L2R) is currently an essential task in basically all types of information systems given the huge and ever increasing amount of data made available. While many solutions have been proposed to improve L2R functions, relatively little attention has been paid to the task of improving the quality of the feature space. L2R strategies usually rely on dense feature representations, which contain noisy or redundant features, increasing the cost of the learning process, without any benefits. Although feature selection (FS) strategies can be applied to reduce dimensionality and noise, side effects of such procedures have been neglected, such as the risk of getting very poor predictions in a few (but important) queries. In this paper we propose multi-objective FS strategies that optimize both aspects at the same time: ranking performance and risk-sensitive evaluation. For this, we approximate the Pareto-optimal set for multi-objective optimization in a new and original application to L2R. Our contributions include novel FS methods for L2R which optimize multiple, potentially conflicting, criteria. In particular, one of the objectives (risk-sensitive evaluation) has never been optimized in the context of FS for L2R before. Our experimental evaluation shows that our proposed methods select features that are more effective (ranking performance) and low-risk than those selected by other state-of-the-art FS methods.

Expert Systems With Applications | 2016