Renato Moraes Silva
State University of Campinas
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Renato Moraes Silva.
international conference on machine learning and applications | 2012
Renato Moraes Silva; Akebo Yamakami; Tiago A. Almeida
The web is becoming an increasingly important source of entertainment, communication, research, news and trade. In this way, the web sites compete to attract the attention of users and many of them achieve visibility through malicious strategies that try to circumvent the search engines. Such sites are known as web spam and they are generally responsible for personal injury and economic losses. Given this scenario, this paper presents a comprehensive performance evaluation of several established machine learning techniques used to automatically detect and filter hosts that disseminate web spam. Our experiments were diligently designed to ensure statistically sounds results and they indicate that bagging of decision trees, multilayer perceptron neural networks, random forest and adaptive boosting of decision trees are promising in the task of web spam classification and, hence, they can be used as a good baseline for further comparison.
Knowledge Based Systems | 2017
Renato Moraes Silva; Tiago A. Almeida; Akebo Yamakami
Abstract In many areas, the volume of text information is increasing rapidly, thereby demanding efficient text classification approaches. Several methods are available at present, but most exhibit declining performance as the dimensionality of the problem increases, or they incur high computational costs for training, which limit their application in real scenarios. Thus, it is necessary to develop a method that can process high dimensional data in a rapid manner. In this study, we propose the MDLText , an efficient, lightweight, scalable, and fast multinomial text classifier, which is based on the minimum description length principle. MDLText exhibits fast incremental learning as well as being sufficiently robust to prevent overfitting, which are desirable features in real-world applications, large-scale problems, and online scenarios. Our experiments were carefully designed to ensure that we obtained statistically sound results, which demonstrated that the proposed approach achieves a good balance between predictive power and computational efficiency.
ibero-american conference on artificial intelligence | 2012
Renato Moraes Silva; Tiago A. Almeida; Akebo Yamakami
The steady growth and popularization of the Web increases the competition between the websites and creates opportunities for profit in several segments. Thus, there is a great interest in keeping the website in a good position in search results. The problem is that many websites use techniques to circumvent the search engines which deteriorates the search results and exposes users to dangerous content. Given this scenario, this paper presents a performance evaluation of different models of artificial neural networks to automatically classify web spam. We have conducted an empirical experiment using a well-known, large and public web spam database. The results indicate that the evaluated approaches outperform the state-of-the-art web spam filters.
Expert Systems With Applications | 2017
Renato Moraes Silva; Túlio C. Alberto; Tiago A. Almeida; Akebo Yamakami
A new classifier is presented to detect undesired short text comments.The proposed approach is light, fast, multinomial and offers incremental learning.The impact of applying text normalization and semantic indexing is studied.The results indicate the proposed techniques outperformed most of the approaches.Text normalization and semantic indexing enhanced the classifiers performance. The popularity and reach of short text messages commonly used in electronic communication have led spammers to use them to propagate undesired content. This is often composed by misleading information, advertisements, viruses, and malwares that can be harmful and annoying to users. The dynamic nature of spam messages demands for knowledge-based systems with online learning and, therefore, the most traditional text categorization techniques can not be used. In this study, we introduce the MDLText, a text classifier based on the minimum description length principle, to the context of filtering undesired short text messages. The proposed approach supports incremental learning and, therefore, its predictive model is scalable and can adapt to continuously evolving spamming techniques. It is also fast, with computational cost increasing linearly with the number of samples and features, which is very desirable for expert systems applied to real-time electronic communication. In addition to the dynamic nature of these messages, they are also short and usually poorly written, rife with slangs, symbols, and abbreviations that difficult text representation, learning, and filtering. In this scenario, we also investigated the benefits of using text normalization and semantic indexing techniques. We showed these techniques can improve the text content quality and, consequently, enhance the performance of the expert systems for spamming detection. Based on these findings, we propose a new hybrid ensemble approach that combines the predictions obtained by the classifiers using the original text samples along with their variations created by applying text normalization and semantic indexing techniques. It has the advantages of being independent of the classification method and the results indicated it is efficient to filter undesired short text messages.
international conference on machine learning and applications | 2016
Renato Moraes Silva; Tiago A. Almeida; Akebo Yamakami
The steady growth and popularization of the Web has led spammers to develop techniques to circumvent search engines aiming good visibility to their web pages in search results. They are responsible for serious problems such as dissatisfaction, irritation, exposure to unpleasant or malicious content, and financial loss. Despite different machine learning approaches have been used to detect web spam, many of them suffer with the curse of dimensionality or require a very high computational cost impeding their employment in real scenarios. In this way, there is still a big effort to develop more advanced methods that at the same time are able to prevent overfitting and fast to learn. To fill this gap, we present the MDLClass, a classifier technique based on the minimum description length principle, applied to the context of web spam filtering. The proposed method is very efficient, lightweight, multi-class, and fast. We also evaluated a new approach to detect web spam that combines the predictions obtained by the classifiers using content-based, link-based, and transformed link-based features. In our experiments, we employed two real, public and large datasets: the WEBSPAM-UK2006 and the WEBSPAM-UK2007. The results indicate that the proposed MDLClass and ensemble of predictions using different types of features are promising in the task of web spam filtering.
Neurocomputing | 2018
Emerson F. Cardoso; Renato Moraes Silva; Tiago A. Almeida
Abstract Online opinions significantly influence consumer purchase decisions. Unfortunately, this has led to a dramatic increase of fake (or spam) reviews that can damage the reputation of brands and artificially manipulate users’ perceptions about products and companies. Despite the efforts of several studies on fake review detection, important questions still remain open. For instance, there is no consensus if the performance of the classification methods is affected when they are used in real-world scenarios that require online learning. Moreover, it is also not known if the performance of the methods decreases due to the time-ordered nature of the reviews. To answer these and other important open questions, this work presents a comprehensive analysis of content-based classification methods for fake review detection. The experiments were performed in multiple settings, employing different types of learning and datasets. A careful analysis of the results provided sufficient evidence to respond appropriately to the open questions, which can be used as a baseline for future studies.
Archive | 2012
Renato Moraes Silva; Tiago A. Almeida; Akebo Yamakami
International Journal of Information Security Science | 2013
Tiago A. Almeida; Renato Moraes Silva; Akebo Yamakami
international symposium on neural networks | 2018
Leandro L. Tavares; Renato Moraes Silva; Tiago A. Almeida
iSys - Revista Brasileira de Sistemas de Informação | 2018
Renato Moraes Silva; Tiago A. Almeida; Akebo Yamakami