Juan José Rodríguez

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Juan José Rodríguez is active.

Explore More

Publication

Featured researches published by Juan José Rodríguez.

IEEE Transactions on Medical Imaging | 2010

Random Subspace Ensembles for fMRI Classification

Ludmila I. Kuncheva; Juan José Rodríguez; Catrin O. Plumpton; David Edmund Johannes Linden; Stephen Johnston

Classification of brain images obtained through functional magnetic resonance imaging (fMRI) poses a serious challenge to pattern recognition and machine learning due to the extremely large feature-to-instance ratio. This calls for revision and adaptation of the current state-of-the-art classification methods. We investigate the suitability of the random subspace (RS) ensemble method for fMRI classification. RS samples from the original feature set and builds one (base) classifier on each subset. The ensemble assigns a class label by either majority voting or averaging of output probabilities. Looking for guidelines for setting the two parameters of the method-ensemble size and feature sample size-we introduce three criteria calculated through these parameters: usability of the selected feature sets, coverage of the set of ¿important¿ features, and feature set diversity. Optimized together, these criteria work toward producing accurate and diverse individual classifiers. RS was tested on three fMRI datasets from single-subject experiments: the Haxby data (Haxby, 2001.) and two datasets collected in-house. We found that RS with support vector machines (SVM) as the base classifier outperformed single classifiers as well as some of the most widely used classifier ensembles such as bagging, AdaBoost, random forest, and rotation forest. The closest rivals were the single SVM and bagging of SVM classifiers. We use kappa-error diagrams to understand the success of RS.

international conference on multiple classifier systems | 2007

An experimental study on rotation forest ensembles

Ludmila I. Kuncheva; Juan José Rodríguez

Rotation Forest is a recently proposed method for building classifier ensembles using independently trained decision trees. It was found to be more accurate than bagging, AdaBoost and Random Forest ensembles across a collection of benchmark data sets. This paper carries out a lesion study on Rotation Forest in order to find out which of the parameters and the randomization heuristics are responsible for the good performance. Contrary to common intuition, the features extracted through PCA gave the best results compared to those extracted through non-parametric discriminant analysis (NDA) or random projections. The only ensemble method whose accuracy was statistically indistinguishable from that of Rotation Forest was LogitBoost although it gave slightly inferior results on 20 out of the 32 benchmark data sets. It appeared that the main factor for the success of Rotation Forest is that the transformation matrix employed to calculate the (linear) extracted features is sparse.

IEEE Transactions on Knowledge and Data Engineering | 2007

Classifier Ensembles with a Random Linear Oracle

Ludmila I. Kuncheva; Juan José Rodríguez

We propose a combined fusion-selection approach to classifier ensemble design. Each classifier in the ensemble is replaced by a miniensemble of a pair of subclassifiers with a random linear oracle to choose between the two. It is argued that this approach encourages extra diversity in the ensemble while allowing for high accuracy of the individual ensemble members. Experiments were carried out with 35 data sets from UCI and 11 ensemble models. Each ensemble model was examined with and without the oracle. The results showed that all ensemble methods benefited from the new approach, most markedly so random subspace and bagging. A further experiment with seven real medical data sets demonstrates the validity of these findings outside the UCI data collection

Magnetic Resonance Imaging | 2010

Classifier Ensembles for fMRI Data Analysis: An Experiment

Ludmila I. Kuncheva; Juan José Rodríguez

Functional magnetic resonance imaging (fMRI) is becoming a forefront brain-computer interface tool. To decipher brain patterns, fast, accurate and reliable classifier methods are needed. The support vector machine (SVM) classifier has been traditionally used. Here we argue that state-of-the-art methods from pattern recognition and machine learning, such as classifier ensembles, offer more accurate classification. This study compares 18 classification methods on a publicly available real data set due to Haxby et al. [Science 293 (2001) 2425-2430]. The data comes from a single-subject experiment, organized in 10 runs where eight classes of stimuli were presented in each run. The comparisons were carried out on voxel subsets of different sizes, selected through seven popular voxel selection methods. We found that, while SVM was robust, accurate and scalable, some classifier ensemble methods demonstrated significantly better performance. The best classifiers were found to be the random subspace ensemble of SVM classifiers, rotation forest and ensembles with random linear and random spherical oracle.

Knowledge and Information Systems | 2014

A weighted voting framework for classifiers ensembles

Ludmila I. Kuncheva; Juan José Rodríguez

We propose a probabilistic framework for classifier combination, which gives rigorous optimality conditions (minimum classification error) for four combination methods: majority vote, weighted majority vote, recall combiner and the naive Bayes combiner. The framework is based on two assumptions: class-conditional independence of the classifier outputs and an assumption about the individual accuracies. The four combiners are derived subsequently from one another, by progressively relaxing and then eliminating the second assumption. In parallel, the number of the trainable parameters increases from one combiner to the next. Simulation studies reveal that if the parameter estimates are accurate and the first assumption is satisfied, the order of preference of the combiners is: naive Bayes, recall, weighted majority and majority. By inducing label noise, we expose a caveat coming from the stability-plasticity dilemma. Experimental results with 73 benchmark data sets reveal that there is no definitive best combiner among the four candidates, giving a slight preference to naive Bayes. This combiner was better for problems with a large number of fairly balanced classes while weighted majority vote was better for problems with a small number of unbalanced classes.

acm symposium on applied computing | 2004

Interval and dynamic time warping-based decision trees

Juan José Rodríguez; Carlos J. Alonso

This work presents decision trees adequate for the classification of series data. There are several methods for this task, but most of them focus on accuracy. One of the requirements of data mining is to produce comprehensible models. Decision trees are one of the most comprehensible classifiers. The use of these methods directly on this kind of data is, generally, not adequate, because complex and inaccurate classifiers are obtained. Hence, instead of using the raw features, new ones are constructed.This work presents two types of trees. In interval-based trees, the decision nodes evaluate a function (e.g., the average) in an interval and the result is compared to a threshold. For DTW-based trees each decision node has a reference example. The distance from the example to classify to the reference example is calculated and then it is compared to a threshold.The method for obtaining these trees it is based on 1) to develop a method that obtains for a 2-class data set a classifier formed by a new feature (a function in an interval or the distance to a reference example) and a threshold, 2) to use the boosting method to obtain an ensemble of these classifiers, and 3) to use a method for constructing decision trees using as data set the features selected by boosting.

Knowledge Based Systems | 2005

Support vector machines of interval-based features for time series classification

Juan José Rodríguez; Carlos J. Alonso; Jose A. Maestro

In previous works, a time series classification system has been presented. It is based on boosting very simple classifiers, formed only by one literal. The used literals are based on temporal intervals. The obtained classifiers were simply a linear combination of literals, so it is natural to expect some improvements in the results if those literals were combined in more complex ways. In this work we explore the possibility of using the literals selected by the boosting algorithm as new features, and then using a SVM with these metafeatures. The experimental results show the validity of the proposed method.

Knowledge Based Systems | 2015

Random Balance

José F. Díez-Pastor; Juan José Rodríguez; César Ignacio García-Osorio; Ludmila I. Kuncheva

Proportions of the classes for each ensemble member are chosen randomly.Member training data: sub-sample and over-sample through SMOTE.RB-Boost combines Random Balance with AdaBoost.M2.Experiments with 86 data sets demonstrate the advantage of Random Balance. In Machine Learning, a data set is imbalanced when the class proportions are highly skewed. Imbalanced data sets arise routinely in many application domains and pose a challenge to traditional classifiers. We propose a new approach to building ensembles of classifiers for two-class imbalanced data sets, called Random Balance. Each member of the Random Balance ensemble is trained with data sampled from the training set and augmented by artificial instances obtained using SMOTE. The novelty in the approach is that the proportions of the classes for each ensemble member are chosen randomly. The intuition behind the method is that the proposed diversity heuristic will ensure that the ensemble contains classifiers that are specialized for different operating points on the ROC space, thereby leading to larger AUC compared to other ensembles of classifiers. Experiments have been carried out to test the Random Balance approach by itself, and also in combination with standard ensemble methods. As a result, we propose a new ensemble creation method called RB-Boost which combines Random Balance with AdaBoost.M2. This combination involves enforcing random class proportions in addition to instance re-weighting. Experiments with 86 imbalanced data sets from two well known repositories demonstrate the advantage of the Random Balance approach.

Information Sciences | 2015

Diversity techniques improve the performance of the best imbalance learning ensembles

José-Francisco Díez-Pastor; Juan José Rodríguez; César Ignacio García-Osorio; Ludmila I. Kuncheva

Many real-life problems can be described as unbalanced, where the number of instances belonging to one of the classes is much larger than the numbers in other classes. Examples are spam detection, credit card fraud detection or medical diagnosis. Ensembles of classifiers have acquired popularity in this kind of problems for their ability to obtain better results than individual classifiers. The most commonly used techniques by those ensembles especially designed to deal with imbalanced problems are for example Re-weighting, Oversampling and Undersampling. Other techniques, originally intended to increase the ensemble diversity, have not been systematically studied for their effect on imbalanced problems. Among these are Random Oracles, Disturbing Neighbors, Random Feature Weights or Rotation Forest. This paper presents an overview and an experimental study of various ensemble-based methods for imbalanced problems, the methods have been tested in its original form and in conjunction with several diversity-increasing techniques, using 84 imbalanced data sets from two well known repositories. This paper shows that these diversity-increasing techniques significantly improve the performance of ensemble methods for imbalanced problems and provides some ideas about when it is more convenient to use these diversifying techniques.

Pattern Recognition Letters | 2008

Boosting recombined weak classifiers

Juan José Rodríguez; Jesús Maudes

Boosting is a set of methods for the construction of classifier ensembles. The differential feature of these methods is that they allow to obtain a strong classifier from the combination of weak classifiers. Therefore, it is possible to use boosting methods with very simple base classifiers. One of the most simple classifiers are decision stumps, decision trees with only one decision node. This work proposes a variant of the most well-known boosting method, AdaBoost. It is based on considering, as the base classifiers for boosting, not only the last weak classifier, but a classifier formed by the last r selected weak classifiers (r is a parameter of the method). If the weak classifiers are decision stumps, the combination of r weak classifiers is a decision tree. The ensembles obtained with the variant are formed by the same number of decision stumps than the original AdaBoost. Hence, the original version and the variant produce classifiers with very similar sizes and computational complexities (for training and classification). The experimental study shows that the variant is clearly beneficial.

Explore More