Diego Peteiro-Barral
University of A Coruña
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Diego Peteiro-Barral.
International Journal of Neural Systems | 2015
Enrique Castillo; Diego Peteiro-Barral; Bertha Guijarro Berdiñas; Oscar Fontenla-Romero
This paper presents a novel distributed one-class classification approach based on an extension of the ν-SVM method, thus permitting its application to Big Data data sets. In our method we will consider several one-class classifiers, each one determined using a given local data partition on a processor, and the goal is to find a global model. The cornerstone of this method is the novel mathematical formulation that makes the optimization problem separable whilst avoiding some data points considered as outliers in the final solution. This is particularly interesting and important because the decision region generated by the method will be unaffected by the position of the outliers and the form of the data will fit more precisely. Another interesting property is that, although built in parallel, the classifiers exchange data during learning in order to improve their individual specialization. Experimental results using different datasets demonstrate the good performance in accuracy of the decision regions of the proposed method in comparison with other well-known classifiers while saving training time due to its distributed nature.
Progress in Artificial Intelligence | 2013
Diego Peteiro-Barral; Bertha Guijarro-Berdiñas
Traditionally, a bottleneck preventing the development of more intelligent systems was the limited amount of data available. Nowadays, the total amount of information is almost incalculable and automatic data analyzers are even more needed. However, the limiting factor is the inability of learning algorithms to use all the data to learn within a reasonable time. In order to handle this problem, a new field in machine learning has emerged: large-scale learning. In this context, distributed learning seems to be a promising line of research since allocating the learning process among several workstations is a natural way of scaling up learning algorithms. Moreover, it allows to deal with data sets that are naturally distributed, a frequent situation in many real applications. This study provides some background regarding the advantages of distributed environments as well as an overview of distributed learning for dealing with “very large” data sets.
IEEE Journal of Biomedical and Health Informatics | 2014
Beatriz Remeseiro; Verónica Bolón-Canedo; Diego Peteiro-Barral; Amparo Alonso-Betanzos; Bertha Guijarro-Berdiñas; A. Mosquera; Manuel G. Penedo; Noelia Sánchez-Maroño
Dry eye is a symptomatic disease which affects a wide range of population and has a negative impact on their daily activities. Its diagnosis can be achieved by analyzing the interference patterns of the tear film lipid layer and by classifying them into one of the Guillon categories. The manual process done by experts is not only affected by subjective factors but is also very time consuming. In this paper we propose a general methodology to the automatic classification of tear film lipid layer, using color and texture information to characterize the image and feature selection methods to reduce the processing time. The adequacy of the proposed methodology was demonstrated since it achieves classification rates over 97% while maintaining robustness and provides unbiased results. Also, it can be applied in real time, and so allows important time savings for the experts.
Expert Systems With Applications | 2013
Diego Peteiro-Barral; Verónica Bolón-Canedo; Amparo Alonso-Betanzos; Bertha Guijarro-Berdiñas; Noelia Sánchez-Maroño
In the past few years, the bottleneck for machine learning developers is not longer the limited data available but the algorithms inability to use all the data in the available time. For this reason, researches are now interested not only in the accuracy but also in the scalability of the machine learning algorithms. To deal with large-scale databases, feature selection can be helpful to reduce their dimensionality, turning an impracticable algorithm into a practical one. In this research, the influence of several feature selection methods on the scalability of four of the most well-known training algorithms for feedforward artificial neural networks (ANNs) will be analyzed over both classification and regression tasks. The results demonstrate that feature selection is an effective tool to improve scalability.
Expert Systems With Applications | 2016
Verónica Bolón-Canedo; Diego Fernández-Francos; Diego Peteiro-Barral; Amparo Alonso-Betanzos; Bertha Guijarro-Berdiñas; Noelia Sánchez-Maroño
A proposal for online feature selection is proposed.The proposed pipeline covers discretization, feature selection and classification.Classical algorithms were modified to make them work online.K-means discretizer, Chi-Square filter and Artificial Neural Networks were used.Results show that classification error is decreasing, adapting to the arrival of new data. With the advent of Big Data, data is being collected at an unprecedented fast pace, and it needs to be processed in a short time. To deal with data streams that flow continuously, classical batch learning algorithms cannot be applied and it is necessary to employ online approaches. Online learning consists of continuously revising and refining a model by incorporating new data as they arrive, and it allows important problems such as concept drift or management of extremely high-dimensional datasets to be solved. In this paper, we present a unified pipeline for online learning which covers online discretization, feature selection and classification. Three classical methods-the k-means discretizer, the ?2 filter and a one-layer artificial neural network-have been reimplemented to be able to tackle online data, showing promising results on both synthetic and real datasets.
Expert Systems With Applications | 2013
Diego Peteiro-Barral; Bertha Guijarro-Berdiñas; Beatriz Pérez-Sánchez; Oscar Fontenla-Romero
Until recently, the most common criterion in machine learning for evaluating the performance of algorithms was accuracy. However, the unrestrainable growth of the volume of data in recent years in fields such as bioinformatics, intrusion detection or engineering, has raised new challenges in machine learning not simply regarding accuracy but also scalability. In this research, we are concerned with the scalability of one of the most well-known paradigms in machine learning, artificial neural networks (ANNs), particularly with the training algorithm Sensitivity-Based Linear Learning Method (SBLLM). SBLLM is a learning method for two-layer feedforward ANNs based on sensitivity analysis, that calculates the weights by solving a linear system of equations. The results show that the training algorithm SBLLM performs better in terms of scalability than five of the most popular and efficient training algorithms for ANNs.
international conference on tools with artificial intelligence | 2012
Verónica Bolón-Canedo; Diego Peteiro-Barral; Beatriz Remeseiro; Amparo Alonso-Betanzos; Bertha Guijarro-Berdiñas; A. Mosquera; Manuel G. Penedo; Noelia Sánchez-Maroño
Dry eye is a symptomatic disease which affects a wide range of population and has a negative impact on their daily activities, such as driving or working with computers. Its diagnosis can be achieved by several clinical tests, one of which is the analysis of the interference pattern and its classification into one of the Guillons categories. The methodologies for automatic classification obtain promising results but at the expense of requiring a long processing time. In this research, feature selection techniques are used to reduce time whilst maintaining performance, paving the way for the development of a novel tool for automatic classification of tear film lipid layer. This tool produces significant classification rates over 96% compared with the annotations of the optometrists and provides unbiased results. Also, it works in real-time and so allows important time savings for the experts.
CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence | 2011
Verónica Bolón-Canedo; Diego Peteiro-Barral; Amparo Alonso-Betanzos; Bertha Guijarro-Berdiñas; Noelia Sánchez-Maroño
The advent of high dimensionality problems has brought new challenges for machine learning researchers, who are now interested not only in the accuracy but also in the scalability of algorithms. In this context, machine learning can take advantage of feature selection methods to deal with large-scale databases. Feature selection is able to reduce the temporal and spatial complexity of learning, turning an impracticable algorithm into a practical one. In this work, the influence of feature selection on the scalability of four of the most well-known training algorithms for feedforward artificial neural networks (ANNs) is studied. Six different measures are considered to evaluate scalability, allowing to establish a final score to compare the algorithms. Results show that including a feature selection step, ANNs algorithms perform much better in terms of scalability.
Knowledge and Information Systems | 2018
Verónica Bolón-Canedo; D. Rego-Fernández; Diego Peteiro-Barral; Amparo Alonso-Betanzos; Bertha Guijarro-Berdiñas; Noelia Sánchez-Maroño
Lately, derived from the explosion of high dimensionality, researchers in machine learning became interested not only in accuracy, but also in scalability. Although scalability of learning methods is a trending issue, scalability of feature selection methods has not received the same amount of attention. This research analyzes the scalability of state-of-the-art feature selection methods, belonging to filter, embedded and wrapper approaches. For this purpose, several new measures are presented, based not only on accuracy but also on execution time and stability. The results on seven classical artificial datasets are presented and discussed, as well as two cases study analyzing the particularities of microarray data and the effect of redundancy. Trying to check whether the results can be generalized, we included some experiments with two real datasets. As expected, filters are the most scalable feature selection approach, being INTERACT, ReliefF and mRMR the most accurate methods.
international conference on artificial intelligence and soft computing | 2013
Diego Peteiro-Barral; Bertha Guijarro-Berdiñas
In recent years, the unrestrainable growth of the volume of data has raised new challenges in machine learning regarding scalability. Scalability comprises not simply accuracy but several other measures regarding computational resources. In order to compare the scalability of algorithms it is necessary to establish a method allowing integrating all these measures into a single rank. These methods should be able to i) merge results of algorithms to be compared from different benchmark data sets, ii) quantitatively measure the difference between algorithms, and iii) weight some measures against others if necessary. In order to manage these issues, in this research we propose the use of TOPSIS as multiple-criteria decision-making method to rank algorithms. The use of this method will be illustrated to obtain a study on the scalability of five of the most well-known training algorithms for artificial neural networks (ANNs).