César Ignacio García-Osorio

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where César Ignacio García-Osorio is active.

Explore More

Publication

Featured researches published by César Ignacio García-Osorio.

Artificial Intelligence | 2010

Democratic instance selection: A linear complexity instance selection algorithm based on classifier ensemble concepts

César Ignacio García-Osorio; Aida de Haro-García; Nicolás García-Pedrajas

Instance selection is becoming increasingly relevant due to the huge amount of data that is constantly being produced in many fields of research. Although current algorithms are useful for fairly large datasets, scaling problems are found when the number of instances is in the hundreds of thousands or millions. When we face huge problems, scalability becomes an issue, and most algorithms are not applicable. Thus, paradoxically, instance selection algorithms are for the most part impracticable for the same problems that would benefit most from their use. This paper presents a way of avoiding this difficulty using several rounds of instance selection on subsets of the original dataset. These rounds are combined using a voting scheme to allow good performance in terms of testing error and storage reduction, while the execution time of the process is significantly reduced. The method is particularly efficient when we use instance selection algorithms that are high in computational cost. The proposed approach shares the philosophy underlying the construction of ensembles of classifiers. In an ensemble, several weak learners are combined to form a strong classifier; in our method several weak (in the sense that they are applied to subsets of the data) instance selection algorithms are combined to produce a strong and fast instance selection method. An extensive comparison of 30 medium and large datasets from the UCI Machine Learning Repository using 3 different classifiers shows the usefulness of our method. Additionally, the method is applied to 5 huge datasets (from three hundred thousand to more than a million instances) with good results and fast execution time.

Knowledge Based Systems | 2015

Random Balance

José F. Díez-Pastor; Juan José Rodríguez; César Ignacio García-Osorio; Ludmila I. Kuncheva

Proportions of the classes for each ensemble member are chosen randomly.Member training data: sub-sample and over-sample through SMOTE.RB-Boost combines Random Balance with AdaBoost.M2.Experiments with 86 data sets demonstrate the advantage of Random Balance. In Machine Learning, a data set is imbalanced when the class proportions are highly skewed. Imbalanced data sets arise routinely in many application domains and pose a challenge to traditional classifiers. We propose a new approach to building ensembles of classifiers for two-class imbalanced data sets, called Random Balance. Each member of the Random Balance ensemble is trained with data sampled from the training set and augmented by artificial instances obtained using SMOTE. The novelty in the approach is that the proportions of the classes for each ensemble member are chosen randomly. The intuition behind the method is that the proposed diversity heuristic will ensure that the ensemble contains classifiers that are specialized for different operating points on the ROC space, thereby leading to larger AUC compared to other ensembles of classifiers. Experiments have been carried out to test the Random Balance approach by itself, and also in combination with standard ensemble methods. As a result, we propose a new ensemble creation method called RB-Boost which combines Random Balance with AdaBoost.M2. This combination involves enforcing random class proportions in addition to instance re-weighting. Experiments with 86 imbalanced data sets from two well known repositories demonstrate the advantage of the Random Balance approach.

Information Sciences | 2015

Diversity techniques improve the performance of the best imbalance learning ensembles

José-Francisco Díez-Pastor; Juan José Rodríguez; César Ignacio García-Osorio; Ludmila I. Kuncheva

Many real-life problems can be described as unbalanced, where the number of instances belonging to one of the classes is much larger than the numbers in other classes. Examples are spam detection, credit card fraud detection or medical diagnosis. Ensembles of classifiers have acquired popularity in this kind of problems for their ability to obtain better results than individual classifiers. The most commonly used techniques by those ensembles especially designed to deal with imbalanced problems are for example Re-weighting, Oversampling and Undersampling. Other techniques, originally intended to increase the ensemble diversity, have not been systematically studied for their effect on imbalanced problems. Among these are Random Oracles, Disturbing Neighbors, Random Feature Weights or Rotation Forest. This paper presents an overview and an experimental study of various ensemble-based methods for imbalanced problems, the methods have been tested in its original form and in conjunction with several diversity-increasing techniques, using 84 imbalanced data sets from two well known repositories. This paper shows that these diversity-increasing techniques significantly improve the performance of ensemble methods for imbalanced problems and provides some ideas about when it is more convenient to use these diversifying techniques.

Information Sciences | 2012

Supervised subspace projections for constructing ensembles of classifiers

Nicolás García-Pedrajas; Jesús Maudes-Raedo; César Ignacio García-Osorio; Juan J. Rodríguez-Díez

We present a method for constructing ensembles of classifiers using supervised projections of random subspaces. The method combines the philosophy of boosting, focusing on difficult instances, with the improved accuracy achieved by supervised projection methods to obtain very good results in terms of testing error. To achieve both accuracy and diversity, random subspaces are created at each step, and within each random subspace, a supervised projection is obtained using only the misclassified instances. The next classifier is trained using all available examples, in the space given by the supervised projections. The method is compared with AdaBoost and other ensemble methods, showing improved performance on a set of 32 problems from the UCI Machine Learning Repository. In terms of testing error, it obtains results that are significantly better than AdaBoost and random subspace method, using a decision tree as base learner. Furthermore, the robustness of the method in the presence of class label noise is above the results obtained with AdaBoost. A study performed using @k-error diagrams shows that the proposed method improves the results of boosting by obtaining diverse and more accurate classifiers. The decomposition of testing error into bias and variance terms shows that our method performs better than Bagging in terms of reducing the bias term of the error, and better than AdaBoost in terms of reducing the variance term of the error.

Information Fusion | 2012

Random feature weights for decision tree ensemble construction

Jesús Maudes; Juan José Rodríguez; César Ignacio García-Osorio; Nicolás García-Pedrajas

This paper proposes a method for constructing ensembles of decision trees, random feature weights (RFW). The method is similar to Random Forest, they are methods that introduce randomness in the construction method of the decision trees. In Random Forest only a random subset of attributes are considered for each node, but RFW considers all of them. The source of randomness is a weight associated with each attribute. All the nodes in a tree use the same set of random weights but different from the set of weights in other trees. So, the importance given to the attributes will be different in each tree and that will differentiate their construction. The method is compared to Bagging, Random Forest, Random-Subspaces, AdaBoost and MultiBoost, obtaining favourable results for the proposed method, especially when using noisy data sets. RFW can be combined with these methods. Generally, the combination of RFW with other method produces better results than the combined methods. Kappa-error diagrams and Kappa-error movement diagrams are used to analyse the relationship between the accuracies of the base classifiers and their diversity.

Applications of Supervised and Unsupervised Ensemble Methods | 2009

Disturbing Neighbors Diversity for Decision Forests

Jesús Maudes; Juan José Rodríguez; César Ignacio García-Osorio

Ensemble methods take their output from a set of base predictors. The ensemble accuracy depends on two factors: the base classifiers accuracy and their diversity (how different these base classifiers outputs are from each other). An approach for increasing the diversity of the base classifiers is presented in this paper. The method builds some new features to be added to the training dataset of the base classifier. Those new features are computed using a Nearest Neighbor (NN) classifier built from a few randomly selected instances. The NN classifier returns: (i) an indicator pointing the nearest neighbor and, (ii) the class this NN predicts for the instance. We tested this idea using decision trees as base classifiers . An experimental validation on 62 UCI datasets is provided for traditional ensemble methods, showing that ensemble accuracy and base classifiers diversity are usually improved.

Expert Systems With Applications | 2011

Constructing ensembles of classifiers using supervised projection methods based on misclassified instances

Nicolás García-Pedrajas; César Ignacio García-Osorio

In this paper, we propose an approach for ensemble construction based on the use of supervised projections, both linear and non-linear, to achieve both accuracy and diversity of individual classifiers. The proposed approach uses the philosophy of boosting, putting more effort on difficult instances, but instead of learning the classifier on a biased distribution of the training set, it uses misclassified instances to find a supervised projection that favors their correct classification. We show that supervised projection algorithms can be used for this task. We try several known supervised projections, both linear and non-linear, in order to test their ability in the present framework. Additionally, the method is further improved introducing concepts from oversampling for imbalance datasets. The introduced method counteracts the negative effect of a low number of instances for constructing the supervised projections. The method is compared with AdaBoost showing an improved performance on a large set of 45 problems from the UCI Machine Learning Repository. Also, the method shows better robustness in presence of noise with respect to AdaBoost.

Pattern Recognition Letters | 2010

Forests of nested dichotomies

Juan José Rodríguez; César Ignacio García-Osorio; Jesús Maudes

Ensemble methods are often able to generate more accurate classifiers than the individual classifiers. In multiclass problems, it is possible to obtain an ensemble combining binary classifiers. It is sensible to use a multiclass method for constructing the binary classifiers, because the ensemble of binary classifiers can be more accurate than the individual multiclass classifier. Ensemble of nested dichotomies (END) is a method for dealing with multiclass classification problems using binary classifiers. A nested dichotomy organizes the classes in a tree, each internal node has a binary classifier. A set of classes can be organized in different ways in a nested dichotomy. An END is formed by several nested dichotomies. This paper studies the use of this method in conjunction with ensembles of decision trees (forests). Although forests methods are able to deal directly with several classes, their accuracies can be improved if they are used as base classifiers for ensembles of nested dichotomies. Moreover, the accuracies can be improved even more using forests of nested dichotomies, that is, ensemble methods that use as base classifiers a nested dichotomy of decision trees. The improvements over forests methods can be explained by the increased diversity of the base classifiers. The best overall results were obtained using MultiBoost with resampling.

Information Fusion | 2016

Fusion of instance selection methods in regression tasks

Álvar Arnaiz-González; Marcin Blachnik; Mirosław Kordos; César Ignacio García-Osorio

Few instance selection (IS) methods exist for regression.Two different families of instance selection methods for regression are compared.One is based in a simple discretization of the output variable, but with good results.Both approaches can be used to adapt to regression IS methods for classification.The fusion of these IS algorithms in an ensemble for regression is also analyzed. Data pre-processing is a very important aspect of data mining. In this paper we discuss instance selection used for prediction algorithms, which is one of the pre-processing approaches. The purpose of instance selection is to improve the data quality by data size reduction and noise elimination. Until recently, instance selection has been applied mainly to classification problems. Very few recent papers address instance selection for regression tasks. This paper proposes fusion of instance selection algorithms for regression tasks to improve the selection performance. As the members of the ensemble two different families of instance selection methods are evaluated: one based on distance threshold and the other one on converting the regression task into a multiple class classification task. Extensive experimental evaluation performed on the two regression versions of the Edited Nearest Neighbor (ENN) and Condensed Nearest Neighbor (CNN) methods showed that the best performance measured by the error value and data size reduction are in most cases obtained for the ensemble methods.

Applied Intelligence | 2011

Random projections for linear SVM ensembles

Jesús Maudes; Juan José Rodríguez; César Ignacio García-Osorio; Carlos Pardo

This paper presents an experimental study using different projection strategies and techniques to improve the performance of Support Vector Machine (SVM) ensembles. The study has been made over 62 UCI datasets using Principal Component Analysis (PCA) and three types of Random Projections (RP), taking into account the size of the projected space and using linear SVMs as base classifiers. Random Projections are also combined with the sparse matrix strategy used by Rotation Forests, which is a method based in projections too. Experiments show that for SVMs ensembles (i) sparse matrix strategy leads to the best results, (ii) results improve when projected space dimension is bigger than the original one, and (iii) Random Projections also contribute to the results enhancement when used instead of PCA. Finally, random projected SVMs are tested as base classifiers of some state of the art ensembles, improving their performance.

Explore More