José Ramon Méndez | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where José Ramon Méndez is active.

Explore More

Publication

Featured researches published by José Ramon Méndez.

Expert Systems With Applications | 2007

Applying lazy learning algorithms to tackle concept drift in spam filtering

Florentino Fdez-Riverola; Eva Lorenzo Iglesias; Fernando Díaz; José Ramon Méndez; Juan M. Corchado

A great amount of machine learning techniques have been applied to problems where data is collected over an extended period of time. However, the disadvantage with many real-world applications is that the distribution underlying the data is likely to change over time. In these situations, a problem that many global eager learners face is their inability to adapt to local concept drift. Concept drift in spam is particularly difficult as the spammers actively change the nature of their messages to elude spam filters. Algorithms that track concept drift must be able to identify a change in the target concept (spam or legitimate e-mails) without direct knowledge of the underlying shift in distribution. In this paper we show how a previously successful instance-based reasoning e-mail filtering model can be improved in order to better track concept drift in spam domain. Our proposal is based on the definition of two complementary techniques able to select both terms and e-mails representative of the current situation. The enhanced system is evaluated against other well-known successful lazy learning approaches in two scenarios, all within a cost-sensitive framework. The results obtained from the experiments carried out are very promising and back up the idea that instance-based reasoning systems can offer a number of advantages tackling concept drift in dynamic problems, as in the case of the anti-spam filtering domain.

decision support systems | 2007

SpamHunting: An instance-based reasoning system for spam labelling and filtering

Florentino Fdez-Riverola; Eva Lorenzo Iglesias; Fernando Díaz; José Ramon Méndez; Juan M. Corchado

In this paper we show an instance-based reasoning e-mail filtering model that outperforms classical machine learning techniques and other successful lazy learners approaches in the domain of anti-spam filtering. The architecture of the learning-based anti-spam filter is based on a tuneable enhanced instance retrieval network able to accurately generalize e-mail representations. The reuse of similar messages is carried out by a simple unanimous voting mechanism to determine whether the target case is spam or not. Previous to the final response of the system, the revision stage is only performed when the assigned class is spam whereby the system employs general knowledge in the form of meta-rules.

Applied Soft Computing | 2012

Rough sets for spam filtering: Selecting appropriate decision rules for boundary e-mail classification

Noemí Pérez-Díaz; David Ruano-Ordás; José Ramon Méndez; Juan F. Gálvez; Florentino Fdez-Riverola

Nowadays, spam represents an extensive subset of the information delivered through Internet involving all unsolicited and disturbing communications received while using different services including e-mail, weblogs and forums. In this context, this paper reviews and brings together previous approaches and novel alternatives for applying rough set (RS) theory to the spam filtering domain by defining three different rule execution schemes: MFD (most frequent decision), LNO (largest number of objects) and LTS (largest total strength). With the goal of correctly assessing the suitability of the proposed algorithms, we specifically address and analyse significant questions for appropriate model validation like corpus selection, preprocessing and representational issues, as well as different specific benchmarking measures. From the experiments carried out using several execution schemes for selecting appropriate decision rules generated by rough sets, we conclude that the proposed algorithms can outperform other well-known anti-spam filtering techniques such as support vector machines (SVM), Adaboost and different types of Bayes classifiers.

international conference on data mining | 2006

A comparative performance study of feature selection methods for the anti-spam filtering domain

José Ramon Méndez; Florentino Fdez-Riverola; Fernando Díaz; Eva Lorenzo Iglesias; Juan M. Corchado

In this paper we analyse the strengths and weaknesses of the mainly used feature selection methods in text categorization when they are applied to the spam problem domain. Several experiments with different feature selection methods and content-based filtering techniques are carried out and discussed. Information Gain, χ2-text, Mutual Information and Document Frequency feature selection methods have been analysed in conjunction with Naive Bayes, boosting trees, Support Vector Machines and ECUE models in different scenarios. From the experiments carried out the underlying ideas behind feature selection methods are identified and applied for improving the feature selection process of SpamHunting, a novel anti-spam filtering software able to accurate classify suspicious e-mails.

Expert Systems With Applications | 2012

SDAI: An integral evaluation methodology for content-based spam filtering models

Noemí Pérez-Díaz; David Ruano-Ordás; Florentino Fdez-Riverola; José Ramon Méndez

Tragedy of Commons Theory introduced by Hardin (1968) revealed how shared and limited resources get completely depleted as effect of human behaviour. By analogy, common spamming activities can be properly modelled by this solid theory and, consequently, a young Internet Security Industry has recently emerged to fight against spam. However, the massive intensification of spam deliveries during last years has led to the need of achieving a significant improvement in filter accuracy. In this context, current research efforts are mainly focussed on providing a wide variety of content-based techniques able to overcome common spam filtering inconveniencies. Although theoretical filtering evaluation is generally taken into consideration in scientific works, most of the evaluation protocols are not appropriate to correctly assess the performance of models during filter operation in real environments. In order to cover the gap between basic research and applied deployment of well-known spam filtering techniques, this work proposes a novel straightforward evaluation methodology able to rank available models using four different but complementary perspectives: static, dynamic, adaptive and internationalisation. In the present study, we applied our SDAI methodology to compare eight different well-known content-based spam filtering techniques using several established accuracy measures. Results showed the effect of the knowledge grain-size and evidenced several unexpected situations related with the behaviour of analysed models.

Aerobiologia | 2002

The relationship between the flowering phenophase and airborne pollen of Betula in galicia (N.W. spain)

Victoria Jato; José Ramon Méndez; Javier Rodríguez-Rajo; Carmen Seijo

The aim of this work was to investigate thephenological behaviour of Betula in Galicia, NW Spain, and to examine therelationship between the Betula pollencurves and the flowering phenophase. Threetrees were chosen from the each of ninepopulations of Betula located atdifferent altitudes and phytogeographicpositions. Phenological observations of theflowering periods of Betula were made ineach of them. Environmental factors such asfrequency of mist, latitudinal and topographicposition, proximity of the ocean, degree ofsolar exposure, and altitude result inphenological differences between theinvestigated populations. The correlationbetween the Pollinic Production Index of Betula pollen in Galicia and theaerobiological data of the seven monitoringstations showed that the period in which thehighest concentrations were registered wasalmost synchronous with the flowering times atmost of the phenological stations studied.Other factors such as transport and reflotationshould also be taken into account to provide anadequate interpretation of the aerobiologicaldata of Betula pollen in the atmosphere.

Lecture Notes in Computer Science | 2006

Tracking concept drift at feature selection stage in spamhunting: an anti-spam instance-based reasoning system

José Ramon Méndez; Florentino Fdez-Riverola; Eva Lorenzo Iglesias; Fernando Díaz; Juan M. Corchado

In this paper we propose a novel feature selection method able to handle concept drift problems in spam filtering domain. The proposed technique is applied to a previous successful instance-based reasoning e-mail filtering system called SpamHunting. Our achieved information criterion is based on several ideas extracted from the well-known information measure introduced by Shannon. We show how results obtained by our previous system in combination with the improved feature selection method outperforms classical machine learning techniques and other well-known lazy learning approaches. In order to evaluate the performance of all the analysed models, we employ two different corpus and six well-known metrics in various scenarios.

Expert Systems With Applications | 2010

BioDR: Semantic indexing networks for biomedical document retrieval

Anália Lourenço; Rafael Carreira; Daniel Glez-Peña; José Ramon Méndez; Sónia Carneiro; Luis Mateus Rocha; Fernando Díaz; E. C. Ferreira; Isabel Rocha; Florentino Fdez-Riverola; Miguel Rocha

In Biomedical research, retrieving documents that match an interesting query is a task performed quite frequently. Typically, the set of obtained results is extensive containing many non-interesting documents and consists in a flat list, i.e., not organized or indexed in any way. This work proposes BioDR, a novel approach that allows the semantic indexing of the results of a query, by identifying relevant terms in the documents. These terms emerge from a process of Named Entity Recognition that annotates occurrences of biological terms (e.g. genes or proteins) in abstracts or full-texts. The system is based on a learning process that builds an Enhanced Instance Retrieval Network (EIRN) from a set of manually classified documents, regarding their relevance to a given problem. The resulting EIRN implements the semantic indexing of documents and terms, allowing for enhanced navigation and visualization tools, as well as the assessment of relevance for new documents.

Software - Practice and Experience | 2013

Wirebrush4SPAM: a novel framework for improving efficiency on spam filtering services

Noemí Pérez-Díaz; David Ruano-Ordás; Florentino Fdez-Riverola; José Ramon Méndez

This paper introduces Wirebrush4SPAM, a plug‐in‐based C framework specifically designed for the development of fast spam filters by assembling different antispam schemes and techniques. Wirebrush4SPAM can be used to (i) build, execute and deploy simple spam filters and (ii) develop new techniques that can be easily combined and tested to achieve more accurate antispam models. To construct custom filters, programmers should manage three key concepts: filtering functions, parsers and event listeners. The main features of Wirebrush4SPAM include (i) a plug‐in‐based design, (ii) cache support for developing new plug‐ins, (iii) a smart filter evaluation heuristic for improving filter execution, (iv) configurable rule scheduling and (v) support for domain specific rules. Moreover, Wirebrush4SPAM is 10 times faster than SpamAssassin, which stands for the most popular and highly extensible framework for spam filtering. Wirebrush4SPAM is an open‐source project licensed under the terms of GNU lesser general public license and both source code and documentation are publicly available at http://www.wb4spam.org/. Copyright

Journal of Systems and Software | 2013

Effective scheduling strategies for boosting performance on rule-based spam filtering frameworks

David Ruano-Ordás; Jorge Fdez-Glez; Florentino Fdez-Riverola; José Ramon Méndez

Despite the enormous importance of e-mail to current worldwide communication, the increase of spam deliveries has had a significant adverse effect for all its users. In order to adequately fight spam, both the filtering industry and scientific community have developed and deployed the fastest and most accurate filtering techniques. However, the increasing volume of new incoming messages needing classification together with the lack of adequate support for anti-spam services on the cloud, make filtering efficiency an absolute necessity. In this context, and given the extensive utilization and increasing significance of rule-based filtering frameworks for the anti-spam domain, this work studies and analyses the importance of both existing and novel scheduling strategies to make the most of currently available anti-spam filtering techniques. Results obtained from the experiments demonstrated that some scheduling alternatives resulted in time savings of up to 26% for filtering messages, while maintaining the same classification accuracy.

Explore More