David Ruano-Ordás | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Ruano-Ordás is active.

Explore More

Publication

Featured researches published by David Ruano-Ordás.

Applied Soft Computing | 2012

Rough sets for spam filtering: Selecting appropriate decision rules for boundary e-mail classification

Noemí Pérez-Díaz; David Ruano-Ordás; José Ramon Méndez; Juan F. Gálvez; Florentino Fdez-Riverola

Nowadays, spam represents an extensive subset of the information delivered through Internet involving all unsolicited and disturbing communications received while using different services including e-mail, weblogs and forums. In this context, this paper reviews and brings together previous approaches and novel alternatives for applying rough set (RS) theory to the spam filtering domain by defining three different rule execution schemes: MFD (most frequent decision), LNO (largest number of objects) and LTS (largest total strength). With the goal of correctly assessing the suitability of the proposed algorithms, we specifically address and analyse significant questions for appropriate model validation like corpus selection, preprocessing and representational issues, as well as different specific benchmarking measures. From the experiments carried out using several execution schemes for selecting appropriate decision rules generated by rough sets, we conclude that the proposed algorithms can outperform other well-known anti-spam filtering techniques such as support vector machines (SVM), Adaboost and different types of Bayes classifiers.

Expert Systems With Applications | 2012

SDAI: An integral evaluation methodology for content-based spam filtering models

Noemí Pérez-Díaz; David Ruano-Ordás; Florentino Fdez-Riverola; José Ramon Méndez

Tragedy of Commons Theory introduced by Hardin (1968) revealed how shared and limited resources get completely depleted as effect of human behaviour. By analogy, common spamming activities can be properly modelled by this solid theory and, consequently, a young Internet Security Industry has recently emerged to fight against spam. However, the massive intensification of spam deliveries during last years has led to the need of achieving a significant improvement in filter accuracy. In this context, current research efforts are mainly focussed on providing a wide variety of content-based techniques able to overcome common spam filtering inconveniencies. Although theoretical filtering evaluation is generally taken into consideration in scientific works, most of the evaluation protocols are not appropriate to correctly assess the performance of models during filter operation in real environments. In order to cover the gap between basic research and applied deployment of well-known spam filtering techniques, this work proposes a novel straightforward evaluation methodology able to rank available models using four different but complementary perspectives: static, dynamic, adaptive and internationalisation. In the present study, we applied our SDAI methodology to compare eight different well-known content-based spam filtering techniques using several established accuracy measures. Results showed the effect of the knowledge grain-size and evidenced several unexpected situations related with the behaviour of analysed models.

Software - Practice and Experience | 2013

Wirebrush4SPAM: a novel framework for improving efficiency on spam filtering services

Noemí Pérez-Díaz; David Ruano-Ordás; Florentino Fdez-Riverola; José Ramon Méndez

This paper introduces Wirebrush4SPAM, a plug‐in‐based C framework specifically designed for the development of fast spam filters by assembling different antispam schemes and techniques. Wirebrush4SPAM can be used to (i) build, execute and deploy simple spam filters and (ii) develop new techniques that can be easily combined and tested to achieve more accurate antispam models. To construct custom filters, programmers should manage three key concepts: filtering functions, parsers and event listeners. The main features of Wirebrush4SPAM include (i) a plug‐in‐based design, (ii) cache support for developing new plug‐ins, (iii) a smart filter evaluation heuristic for improving filter execution, (iv) configurable rule scheduling and (v) support for domain specific rules. Moreover, Wirebrush4SPAM is 10 times faster than SpamAssassin, which stands for the most popular and highly extensible framework for spam filtering. Wirebrush4SPAM is an open‐source project licensed under the terms of GNU lesser general public license and both source code and documentation are publicly available at http://www.wb4spam.org/. Copyright

Journal of Systems and Software | 2013

Effective scheduling strategies for boosting performance on rule-based spam filtering frameworks

David Ruano-Ordás; Jorge Fdez-Glez; Florentino Fdez-Riverola; José Ramon Méndez

Despite the enormous importance of e-mail to current worldwide communication, the increase of spam deliveries has had a significant adverse effect for all its users. In order to adequately fight spam, both the filtering industry and scientific community have developed and deployed the fastest and most accurate filtering techniques. However, the increasing volume of new incoming messages needing classification together with the lack of adequate support for anti-spam services on the cloud, make filtering efficiency an absolute necessity. In this context, and given the extensive utilization and increasing significance of rule-based filtering frameworks for the anti-spam domain, this work studies and analyses the importance of both existing and novel scheduling strategies to make the most of currently available anti-spam filtering techniques. Results obtained from the experiments demonstrated that some scheduling alternatives resulted in time savings of up to 26% for filtering messages, while maintaining the same classification accuracy.

Expert Systems With Applications | 2015

A dynamic model for integrating simple web spam classification techniques

Jorge Fdez-Glez; David Ruano-Ordás; José Ramon Méndez; Florentino Fdez-Riverola; Rosalía Laza; Reyes Pavón

Techniques and heuristics applied for spam dissemination.Rule-based classification model for web spam detection.Knowledge actualization based on incremental learning.Filter performance improvement by classifier fusion.WSF2 framework publicly available under LGPL license. Over the last years, Internet spam content has spread enormously inside web sites mainly due to the emergence of new web technologies oriented towards the online sharing of resources and information. In such a situation, both academia and industry have shown their concern to accurately detect and effectively control web spam, resulting in a good number of anti-spam techniques currently available. However, the successful integration of different algorithms for web spam classification is still a challenge. In this context, the present study introduces WSF2, a novel web spam filtering framework specifically designed to take advantage of multiple classification schemes and algorithms. In detail, our approach encodes the life cycle of a case-based reasoning system, being able to use appropriate knowledge and dynamically adjust different parameters to ensure continuous improvement in filtering precision with the passage of time. In order to correctly evaluate the effectiveness of the dynamic model, we designed a set of experiments involving a publicly available corpus, as well as different simple well-known classifiers and ensemble approaches. The results revealed that WSF2 performed well, being able to take advantage of each classifier and to achieve a better performance when compared to other alternatives. WSF2 is an open-source project licensed under the terms of the LGPL publicly available at https://sourceforge.net/projects/wsf2c/.

Software - Practice and Experience | 2016

Using new scheduling heuristics based on resource consumption information for increasing throughput on rule-based spam filtering systems

David Ruano-Ordás; Jorge Fdez-Glez; Florentino Fdez-Riverola; José Ramon Méndez

The large increase of spam deliveries since the first half of 2013 entailed hard to solve troubles in spam filters. In order to adequately fight spam, the throughput of spam filtering platforms should be necessarily increased. In this context, and taking into consideration the widespread utilization of rule‐based filtering frameworks in the spam filtering domain, this work proposes three novel scheduling strategies for optimizing the time needed to classify new incoming e‐mails through an intelligent management of computational resources depending on the Central Processing Unit (CPU) usage and Input/Output (I/O) delays. In order to demonstrate the suitability of our approaches, we include in our experiments a comparative study in contrast to other successful heuristics previously published in the scientific literature. Results achieved demonstrated that one of our alternative heuristics allows time savings of up to 10% in message filtering, while keeping the same classification accuracy. Copyright

Scientific Programming | 2016

WSF2: a novel framework for filtering web spam

Jorge Fdez-Glez; David Ruano-Ordás; Rosalía Laza; José Ramon Méndez; Reyes Pavón; Florentino Fdez-Riverola

Over the last years, research on web spam filtering has gained interest from both academia and industry. In this context, although there are a good number of successful antispam techniques available (i.e., content-based, link-based, and hiding), an adequate combination of different algorithms supported by an advanced web spam filtering platform would offer more promising results. To this end, we propose the WSF2 framework, a new platform particularly suitable for filtering spam content on web pages. Currently, our framework allows the easy combination of different filtering techniques including, but not limited to, regular expressions and well-known classifiers (i.e., Naive Bayes, Support Vector Machines, and C5.0). Applying our WSF2 framework over the publicly available WEBSPAM-UK2007 corpus, we have been able to demonstrate that a simple combination of different techniques is able to improve the accuracy of single classifiers on web spam detection. As a result, we conclude that the proposed filtering platform is a powerful tool for boosting applied research in this area.

international conference on evolutionary computation | 2018

Quadcriteria optimization of binary classifiers: error rates, coverage, and complexity

Vitor Basto-Fernandes; Iryna Yevseyeva; David Ruano-Ordás; Jiaqi Zhao; Florentino Fdez-Riverola; José Ramon Méndez; Michael T. M. Emmerich

This paper presents a 4-objective evolutionary multiobjective optimization study for optimizing the error rates (false positives, false negatives), reliability, and complexity of binary classifiers. The example taken is the email anti-spam filtering problem.

Information Sciences | 2018

Concept drift in e-mail datasets: An empirical study with practical implications

David Ruano-Ordás; Florentino Fdez-Riverola; José Ramon Méndez

Abstract Internet e-mail service emerged in the late seventies to implement fast message exchanging through computer networks. Network users immediately discovered the value of this service (sometimes for improper purposes such as spamming). As e-mail became indispensable to increase personal productivity, the volume of spam deliveries was constantly growing. With the passage of time, a great number of proposals and tools have emerged to fight against spam. However, the vast majority of them do not properly take into consideration the inner attributes of spam and ham messages such as the noise or the presence of concept drift. In this work, we provide a detailed empirical study of concept drift in the e-mail domain taking into consideration two key aspects: existing types of concept drift and the real class of messages (spam and ham). As a result, our study reveals different weaknesses of multiple e-mail filtering alternatives and other relevant works in this domain and identifies new strategies to develop more accurate filters. Finally, the experimentation carried out in this work has motivated the development of a concept drift analyser tool for the e-mail domain that can be freely downloaded from https://github.com/sing-group/conceptDriftAnalyser.git .

Information Processing and Management | 2018

Using evolutionary computation for discovering spam patterns from e-mail samples

David Ruano-Ordás; Florentino Fdez-Riverola; José Ramon Méndez

Abstract One of the most relevant problems affecting the efficient use of e-mail to communicate worldwide is the spam phenomenon. Spamming involves flooding Internet with undesired messages aimed to promote illegal or low value products and services. Beyond the existence of different well-known machine learning techniques, collaborative schemes and other complementary approaches, some popular anti-spam frameworks such as SpamAssassin or Wirebrush4SPAM enabled the possibility of using regular expressions to effectively improve filter performance. In this work, we provide a review of existing proposals to automatically generate fully functional regular expressions from any input dataset combining spam and ham messages. Due to configuration difficulties and the low performance achieved by analysed schemes, in this work we introduce DiscoverRegex, a novel automatic spam pattern-finding tool. Patterns generated DiscoverRegex outperform those created by existing approaches (able to avoid FP errors) whilst minimising the computational resources required for its proper operation. DiscoverRegex source code is publicly available at https://github.com/sing-group/DiscoverRegex .

Explore More