Miguel García-Torres

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Miguel García-Torres is active.

Explore More

Publication

Featured researches published by Miguel García-Torres.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2011

Peakbin Selection in Mass Spectrometry Data Using a Consensus Approach with Estimation of Distribution Algorithms

Rubén Armañanzas; Yvan Saeys; Iñaki Inza; Miguel García-Torres; Concha Bielza; Y. Van de Peer; Pedro Larrañaga

Progress is continuously being made in the quest for stable biomarkers linked to complex diseases. Mass spectrometers are one of the devices for tackling this problem. The data profiles they produce are noisy and unstable. In these profiles, biomarkers are detected as signal regions (peaks), where control and disease samples behave differently. Mass spectrometry (MS) data generally contain a limited number of samples described by a high number of features. In this work, we present a novel class of evolutionary algorithms, estimation of distribution algorithms (EDA), as an efficient peak selector in this MS domain. There is a trade-of f between the reliability of the detected biomarkers and the low number of samples for analysis. For this reason, we introduce a consensus approach, built upon the classical EDA scheme, that improves stability and robustness of the final set of relevant peaks. An entire data workflow is designed to yield unbiased results. Four publicly available MS data sets (two MALDI-TOF and another two SELDI-TOF) are analyzed. The results are compared to the original works, and a new plot (peak frequential plot) for graphically inspecting the relevant peaks is introduced. A complete online supplementary page, which can be found at http://www.sc.ehu.es/ccwbayes/members/ruben/ms, includes extended info and results, in addition to Matlab scripts and references.

Information Sciences | 2016

High-dimensional feature selection via feature grouping

Miguel García-Torres; Francisco Gómez-Vela; Belén Melián-Batista; J. Marcos Moreno-Vega

We introduce the concept of predominant group based on the idea of Markov blanket to identify groups of correlated features.We propose a greedy strategy (GreedyPGG) that groups features based on the concept of predominant groups.We propose a VNS metaheuristic that uses the GreedyPGG strategy to reduce the dimensionality in high-dimensional data.Results show that VNS finds smaller subsets of features without degrading the predictive model. In recent years, advances in technology have led to increasingly high-dimensional datasets. This increase of dimensionality along with the presence of irrelevant and redundant features make the feature selection process challenging with respect to efficiency and effectiveness. In this context, approximate algorithms are typically applied since they provide good solutions in a reasonable time. On the other hand, feature grouping has arisen as a powerful approach to reduce dimensionality in high-dimensional data. Recently, some authors have focused their attention on developing methods that combine feature grouping and feature selection to improve the model. In this paper, we propose a feature selection strategy that utilizes feature grouping to increase the effectiveness of the search. As feature selection strategy, we propose a Variable Neighborhood Search (VNS) metaheuristic. Then, we propose to group the input space into subsets of features by using the concept of Markov blankets. To the best of our knowledge, this is the first time in which the Markov blanket is used for grouping features. We test the performance of VNS by conducting experiments on several high-dimensional datasets from two different domains: microarray and text mining. We compare VNS with popular and competitive techniques. Results show that VNS is a competitive strategy capable of finding a small size of features with similar predictive power than that obtained with other algorithms used in this study.

Expert Systems With Applications | 2012

Fast feature selection aimed at high-dimensional data via hybrid-sequential-ranked searches

Roberto Ruiz; José C. Riquelme; Jesús S. Aguilar-Ruiz; Miguel García-Torres

We address the feature subset selection problem for classification tasks. We examine the performance of two hybrid strategies that directly search on a ranked list of features and compare them with two widely used algorithms, the fast correlation based filter (FCBF) and sequential forward selection (SFS). The proposed hybrid approaches provide the possibility of efficiently applying any subset evaluator, with a wrapper model included, to large and high-dimensional domains. The experiments performed show that our two strategies are competitive and can select a small subset of features without degrading the classification error or the advantages of the strategies under study.

Information Sciences | 2013

Comparison of metaheuristic strategies for peakbin selection in proteomic mass spectrometry data

Miguel García-Torres; Rubén Armañanzas; Concha Bielza; Pedro Larrañaga

Mass spectrometry (MS) data provide a promising strategy for biomarker discovery. For this purpose, the detection of relevant peakbins in MS data is currently under intense research. Data from mass spectrometry are challenging to analyze because of their high dimensionality and the generally low number of samples available. To tackle this problem, the scientific community is becoming increasingly interested in applying feature subset selection techniques based on specialized machine learning algorithms. In this paper, we present a performance comparison of some metaheuristics: best first (BF), genetic algorithm (GA), scatter search (SS) and variable neighborhood search (VNS). Up to now, all the algorithms, except for GA, have been first applied to detect relevant peakbins in MS data. All these metaheuristic searches are embedded in two different filter and wrapper schemes coupled with Naive Bayes and SVM classifiers.

international conference industrial engineering other applications applied intelligent systems | 2010

Feature selection applied to data from the Sloan digital sky survey

Miguel Á. Montero; Roberto Ruiz; Miguel García-Torres; L. M. Sarro

In recent years there has been an explosion in the rate of acquisition of astronomical data. The analysis of astronomical data presents unprecedented opportunities and challenges for data mining in tasks, such as clustering, object discovery and classification. In this work, we address the feature selection problem in classification of photometric and spectroscopic data collected from the SDSS survey. We present a comparison of five feature selection algoritms: best first (BF), scatter search (SS), genetic algorithm (GA), best incremental ranked subset (BI) and best agglomerative ranked subset (BA). Up to now all these strategies were first applied to this paper to study relevant features in SDSS data.

PLOS ONE | 2018

The blessing of Dimensionality: Feature Selection outperforms functional connectivity-based feature transformation to classify ADHD subjects from EEG patterns of phase synchronisation

Ernesto Pereda; Miguel García-Torres; Belén Melián-Batista; Soledad Mañas; Leopoldo D. Méndez; Julián J. González

Functional connectivity (FC) characterizes brain activity from a multivariate set of N brain signals by means of an NxN matrix A, whose elements estimate the dependence within each possible pair of signals. Such matrix can be used as a feature vector for (un)supervised subject classification. Yet if N is large, A is highly dimensional. Little is known on the effect that different strategies to reduce its dimensionality may have on its classification ability. Here, we apply different machine learning algorithms to classify 33 children (age [6-14 years]) into two groups (healthy controls and Attention Deficit Hyperactivity Disorder patients) using EEG FC patterns obtained from two phase synchronisation indices. We found that the classification is highly successful (around 95%) if the whole matrix A is taken into account, and the relevant features are selected using machine learning methods. However, if FC algorithms are applied instead to transform A into a lower dimensionality matrix, the classification rate drops to less than 80%. We conclude that, for the purpose of pattern classification, the relevant features should be selected among the elements of A by using appropriate machine learning algorithms.

international conference on bioinformatics and biomedical engineering | 2017

Bioinformatics from a Big Data Perspective: Meeting the Challenge

Francisco Gómez-Vela; Aurelio López; José Antonio Lagares; Domingo S. Rodríguez Baena; Carlos D. Barranco; Miguel García-Torres; Federico Divina

Recently, the rising of the Big Data paradigm has had a great impact in several fields. Bioformatics is one such field. In fact, Bioinfomatics had to evolve in order to adapt to this phenomenon. The exponential increase of the biological information available, forced the researchers to find new solutions to handle these new challenges.

hybrid artificial intelligence systems | 2016

Feature Selection Using Approximate Multivariate Markov Blankets

Rafael Arias-Michel; Miguel García-Torres; Christian Schaerer; Federico Divina

In classification tasks, feature selection has become an important research area. In general, the performance of a classifier is intrinsically affected by existence of irrelevant and redundant features. In order to find an optimal subset of features, Markov blanket discovery can be used to identify such subset. The Approximate Markov blanket (AMb) is a standard approach to induce Markov blankets from data. However, this approach considers only pairwise comparisons of features. In this paper, we introduce a multivariate approach to the AMb definition, called Approximate Multivariate Markov blanket (AMMb), which takes into account interactions among different features of a given subset. In order to test the AMMb, we consider a backward strategy similar to the Fast Correlation Based Filter (FCBF), which incorporates our proposal. The resulting algorithm, named as FCBF\(_{ntc}\), is compared against the FCBF, Best First (BF) and Sequential Forward Selection (SFS) and tested on both synthetic and real-world datasets. Results show that the inclusion of interactions among features in a subset may yield smaller subsets of features without degrading the classification task.

2015 International Workshop on Data Mining with Industrial Applications (DMIA) | 2015

Feature Grouping and Selection on High-Dimensional Microarray Data

Miguel García-Torres; Francisco Gómez-Vela; David Becerra-Alonso; Belén Melián-Batista; J. Marcos Moreno-Vega

In classification tasks, as the dimensionality increases, the performance of the classifier improves until an optimal number of features is reached. Further increases of the dimensionality without increasing the number of training samples results in a degradation in classifier performance. This fact, called the curse of dimensionality, has become more relevant with the advent of larger datasets and the demands of Knowledge Discovery from Big Data. In this context, feature grouping has become an effective approach to provide additional information about relationships between features. In this work, we propose a greedy strategy, called GreedyPGG, that groups features based on the concept of Markov blankets. To such aim, we introduce the idea of predominant group of features. We also present an adaptation of the Variable Neighborhood Search (VNS) to high-dimensional feature selection that uses the GreedyPGG to reduce the search space. We test the effectiveness of the GreedyPGG on synthetic datasets and the VNS on microarray datasets. We compare VNS with popular and competitive strategies. Results show that GreedyPGG groups correlated features in an efficient way and that VNS is a competitive strategy, capable of finding a small number of features with high predictive power.

2015 International Workshop on Data Mining with Industrial Applications (DMIA) | 2015

Feature Selection via Approximated Markov Blankets Using the CFS Method

Rafael Arias-Michel; Miguel García-Torres; Christian Schaerer; Federico Divina

Feature selection has become an important research area in machine learning due to rapid advances in technology. In high-dimensional spaces, the difficulty of classification is intrinsically caused by the existence of irrelevant and redundant features that, in general, degrade the performance of a classifier. Moreover, finding the optimal subset of features becomes intractable even for low-dimensional datasets. In this context, Markov blanket discovery can be used to identify such subset. The approximated Markov blanket (AMb) is an efficient and effective approach to induce Markov blankets from data. However, this approach only considers pairwise comparisons of features. In this paper, we redefine the AMb to consider the interaction among features of a given subset of features. We use the Correlation based Feature Selection (CFS) function to measure such interactions and, as search strategy, the Fast Correlation based Filter (FCBF). The proposal, denoted as FCBFCFS, is compared with the FCBF and tested on synthetic and real-world datasets from the microarray domain. Results show that the inclusion of interactions among features in a subset may led to smaller subsets of features without degrading the classification task.

Explore More