David Gutiérrez-Avilés
University of Seville
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David Gutiérrez-Avilés.
Neurocomputing | 2014
David Gutiérrez-Avilés; Cristina Rubio-Escudero; Francisco Martínez-Álvarez; José C. Riquelme
Analyzing microarray data represents a computational challenge due to the characteristics of these data. Clustering techniques are widely applied to create groups of genes that exhibit a similar behavior under the conditions tested. Biclustering emerges as an improvement of classical clustering since it relaxes the constraints for grouping genes to be evaluated only under a subset of the conditions and not under all of them. However, this technique is not appropriate for the analysis of longitudinal experiments in which the genes are evaluated under certain conditions at several time points. We present the TriGen algorithm, a genetic algorithm that finds triclusters of gene expression that take into account the experimental conditions and the time points simultaneously. We have used TriGen to mine datasets related to synthetic data, yeast (Saccharomyces cerevisiae) cell cycle and human inflammation and host response to injury experiments. TriGen has proved to be capable of extracting groups of genes with similar patterns in subsets of conditions and times, and these groups have shown to be related in terms of their functional annotations extracted from the Gene Ontology.
Entropy | 2015
Francisco Martínez-Álvarez; David Gutiérrez-Avilés; Antonio Morales-Esteban; Jorge Reyes; José L. Amaro-Mellado; Cristina Rubio-Escudero
A previous definition of seismogenic zones is required to do a probabilistic seismic hazard analysis for areas of spread and low seismic activity. Traditional zoning methods are based on the available seismic catalog and the geological structures. It is admitted that thermal and resistant parameters of the crust provide better criteria for zoning. Nonetheless, the working out of the rheological profiles causes a great uncertainty. This has generated inconsistencies, as different zones have been proposed for the same area. A new method for seismogenic zoning by means of triclustering is proposed in this research. The main advantage is that it is solely based on seismic data. Almost no human decision is made, and therefore, the method is nearly non-biased. To assess its performance, the method has been applied to the Iberian Peninsula, which is characterized by the occurrence of small to moderate magnitude earthquakes. The catalog of the National Geographic Institute of Spain has been used. The output map is checked for validity with the geology. Moreover, a geographic information system has been used for two purposes. First, the obtained zones have been depicted within it. Second, the data have been used to calculate the seismic parameters (b-value, annual rate). Finally, the results have been compared to Kohonen’s self-organizing maps.
The Scientific World Journal | 2014
David Gutiérrez-Avilés; Cristina Rubio-Escudero
Microarrays have revolutionized biotechnological research. The analysis of new data generated represents a computational challenge due to the characteristics of these data. Clustering techniques are applied to create groups of genes that exhibit a similar behavior. Biclustering emerges as a valuable tool for microarray data analysis since it relaxes the constraints for grouping, allowing genes to be evaluated only under a subset of the conditions. However, if a third dimension appears in the data, triclustering is the appropriate tool for the analysis. This occurs in longitudinal experiments in which the genes are evaluated under conditions at several time points. All clustering, biclustering, and triclustering techniques guide their search for solutions by a measure that evaluates the quality of clusters. We present an evaluation measure for triclusters called Mean Square Residue 3D. This measure is based on the classic biclustering measure Mean Square Residue. Mean Square Residue 3D has been applied to both synthetic and real data and it has proved to be capable of extracting groups of genes with homogeneous patterns in subsets of conditions and times, and these groups have shown a high correlation level and they are also related to their functional annotations extracted from the Gene Ontology project.
Evolutionary Bioinformatics | 2015
David Gutiérrez-Avilés; Cristina Rubio-Escudero
Microarray technology is highly used in biological research environments due to its ability to monitor the RNA concentration levels. The analysis of the data generated represents a computational challenge due to the characteristics of these data. Clustering techniques are widely applied to create groups of genes that exhibit a similar behavior. Biclustering relaxes the constraints for grouping, allowing genes to be evaluated only under a subset of the conditions. Triclustering appears for the analysis of longitudinal experiments in which the genes are evaluated under certain conditions at several time points. These triclusters provide hidden information in the form of behavior patterns from temporal experiments with microarrays relating subsets of genes, experimental conditions, and time points. We present an evaluation measure for triclusters called Multi Slope Measure, based on the similarity among the angles of the slopes formed by each profile formed by the genes, conditions, and times of the tricluster.
bioinformatics and biomedicine | 2014
David Gutiérrez-Avilés; Cristina Rubio-Escudero
Microarray technology has led to a great advance in biological studies due to its ability to monitorize the RNA levels of a vast amount of genes under certain experimental conditions. The use of computational techniques to mine hidden knowledge from these data is of great interest in research fields such as Data Mining and Bioinformatics. Finding patterns of genetic behavior not only taking into account the experimental conditions but also the time condition is a very challenging task nowadays. Clustering, biclustering and novel triclustering techniques offer a very suitable framework to solve the suggested problem. In this work we present LSL, a measure to evaluate the quality of triclusters found in 3D data.
hybrid artificial intelligence systems | 2016
David Gutiérrez-Avilés; Cristina Rubio-Escudero
Triclustering has shown to be a valuable tool for the analysis of microarray data since its appearance as an improvement of classical clustering and biclustering techniques. Triclustering relaxes the constraints for grouping and allows genes to be evaluated under a subset of experimental conditions and a subset of time points simultaneously. The authors previously presented a genetic algorithm, TriGen, that finds triclusters of gene expression dasta. They also defined three different fitness functions for TriGen: \(MSR_{3D}\), LSL and MSL. In order to asses the results obtained by application of TriGen, a validity measure needs to be defined. Therefore, we present TRIQ, a validity measure which combines information from three different sources: (1) correlation among genes, conditions and times, (2) graphic validation of the patterns extracted and (3) functional annotations for the genes extracted.
nature and biologically inspired computing | 2011
David Gutiérrez-Avilés; Cristina Rubio-Escudero; José C. Riquelme
Analyzing microarray data represents a computational challenge due to the characteristics of these data. Clustering techniques are widely applied to create groups of genes that exhibit a similar behavior under the conditions tested. Biclustering emerges as an improvement of classical clustering since it relaxes the constraints for grouping allowing genes to be evaluated only under a subset of the conditions and not under all of them. However, this technique is not appropriate for the analysis of temporal microarray data in which the genes are evaluated under certain conditions at several time points. On a previous work we presented the TriGen algorithm, a genetic algorithm that finds triclusters of gene expression that take into account the experimental conditions and the time points simultaneously, and was applied to the yeast (Saccharomyces Cerevisiae) cell cycle problem. In this article we present some improvements on the genetic algorithm and we also present the results of applying the improved TriGen algorithm to the yeast cell cycle problem, where the goal is to identify all genes whose expression levels are regulated by the cell cycle.
CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence | 2011
David Gutiérrez-Avilés; Cristina Rubio-Escudero; José C. Riquelme
Analyzing microarray data represents a computational challenge due to the characteristics of these data. Clustering techniques are widely applied to create groups of genes that exhibit a similar behavior under the conditions tested. Biclustering emerges as an improvement of classical clustering since it relaxes the constraints for grouping allowing genes to be evaluated only under a subset of the conditions and not under all of them. However, this technique is not appropriate for the analysis of temporal microarray data in which the genes are evaluated under certain conditions at several time points. In this paper, we present the results of applying the TriGen algorithm, a genetic algorithm that finds triclusters that take into account the experimental conditions and the time points, to the yeast cell cycle problem, where the goal is to identify all genes whose expression levels are regulated by the cell cycle.
hybrid artificial intelligence systems | 2018
David Gutiérrez-Avilés; J. A. Fábregas; J. Tejedor; Francisco Martínez-Álvarez; Alicia Troncoso; A. Arcos; José C. Riquelme
The main objective of this paper is the application of big data analytics to a real case in the field of smart electric networks. Smart meters are not only elements to measure consumption, but they also constitute a network of millions of sensors in the electricity network. These sensors provide a huge amount of data that, once analyzed, can lead to significant advances for the society. In this way, tools are being developed in order to reach certain goals, such as obtaining a better consumption estimation (which would imply a better production planning), finding better rates based on the time discrimination or the contracted power, or minimizing the non-technical losses in the network, whose actual costs are eventually paid by end-consumers, among others. In this work, real data from Spanish consumers have been analyzed to detect fraud in consumption. First, 1 TB of raw data was preprocessed in a HDFS-Spark infrastructure. Second, data duplication and outliers were removed, and missing values handled with specific big data algorithms. Third, customers were characterized by means of clustering techniques in different scenarios. Finally, several key factors in fraud consumption were found. Very promising results were achieved, verging on 80% accuracy.
Biodata Mining | 2018
David Gutiérrez-Avilés; Raúl Giráldez; Francisco Javier Gil-Cumbreras; Cristina Rubio-Escudero
BackgroundTriclustering has shown to be a valuable tool for the analysis of microarray data since its appearance as an improvement of classical clustering and biclustering techniques. The standard for validation of triclustering is based on three different measures: correlation, graphic similarity of the patterns and functional annotations for the genes extracted from the Gene Ontology project (GO).ResultsWe propose TRIQ, a single evaluation measure that combines the three measures previously described: correlation, graphic validation and functional annotation, providing a single value as result of the validation of a tricluster solution and therefore simplifying the steps inherent to research of comparison and selection of solutions. TRIQ has been applied to three datasets already studied and evaluated with single measures based on correlation, graphic similarity and GO terms. Triclusters have been extracted from this three datasets using two different algorithms: TriGen and OPTricluster.ConclusionsTRIQ has successfully provided the same results as a the three single evaluation measures. Furthermore, we have applied TRIQ to results from another algorithm, OPTRicluster, and we have shown how TRIQ has been a valid tool to compare results from different algorithms in a quantitative straightforward manner. Therefore, it appears as a valid measure to represent and summarize the quality of tricluster solutions. It is also feasible for evaluation of non biological triclusters, due to the parametrization of each component of TRIQ.