Daniel Urda | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniel Urda is active.

Explore More

Publication

Featured researches published by Daniel Urda.

Theoretical Biology and Medical Modelling | 2014

Application of genetic algorithms and constructive neural networks for the analysis of microarray cancer data

Rafael Marcos Luque-Baena; Daniel Urda; José Luis Subirats; Leonardo Franco; José M. Jerez

BackgroundExtracting relevant information from microarray data is a very complex task due to the characteristics of the data sets, as they comprise a large number of features while few samples are generally available. In this sense, feature selection is a very important aspect of the analysis helping in the tasks of identifying relevant genes and also for maximizing predictive information.MethodsDue to its simplicity and speed, Stepwise Forward Selection (SFS) is a widely used feature selection technique. In this work, we carry a comparative study of SFS and Genetic Algorithms (GA) as general frameworks for the analysis of microarray data with the aim of identifying group of genes with high predictive capability and biological relevance. Six standard and machine learning-based techniques (Linear Discriminant Analysis (LDA), Support Vector Machines (SVM), Naive Bayes (NB), C-MANTEC Constructive Neural Network, K-Nearest Neighbors (kNN) and Multilayer perceptron (MLP)) are used within both frameworks using six free-public datasets for the task of predicting cancer outcome.ResultsBetter cancer outcome prediction results were obtained using the GA framework noting that this approach, in comparison to the SFS one, leads to a larger selection set, uses a large number of comparison between genetic profiles and thus it is computationally more intensive. Also the GA framework permitted to obtain a set of genes that can be considered to be more biologically relevant. Regarding the different classifiers used standard feedforward neural networks (MLP), LDA and SVM lead to similar and best results, while C-MANTEC and k-NN followed closely but with a lower accuracy. Further, C-MANTEC, MLP and LDA permitted to obtain a more limited set of genes in comparison to SVM, NB and kNN, and in particular C-MANTEC resulted in the most robust classifier in terms of changes in the parameter settings.ConclusionsThis study shows that if prediction accuracy is the objective, the GA-based approach lead to better results respect to the SFS approach, independently of the classifier used. Regarding classifiers, even if C-MANTEC did not achieve the best overall results, the performance was competitive with a very robust behaviour in terms of the parameters of the algorithm, and thus it can be considered as a candidate technique for future studies.

Journal of Biomedical Informatics | 2014

Robust gene signatures from microarray data using genetic algorithms enriched with biological pathway keywords

Rafael Marcos Luque-Baena; Daniel Urda; M. Gonzalo Claros; Leonardo Franco; José M. Jerez

Genetic algorithms are widely used in the estimation of expression profiles from microarrays data. However, these techniques are unable to produce stable and robust solutions suitable to use in clinical and biomedical studies. This paper presents a novel two-stage evolutionary strategy for gene feature selection combining the genetic algorithm with biological information extracted from the KEGG database. A comparative study is carried out over public data from three different types of cancer (leukemia, lung cancer and prostate cancer). Even though the analyses only use features having KEGG information, the results demonstrate that this two-stage evolutionary strategy increased the consistency, robustness and accuracy of a blind discrimination among relapsed and healthy individuals. Therefore, this approach could facilitate the definition of gene signatures for the clinical prognosis and diagnostic of cancer diseases in a near future. Additionally, it could also be used for biological knowledge discovery about the studied disease.

international conference industrial engineering other applications applied intelligent systems | 2010

Constructive neural networks to predict breast cancer outcome by using gene expression profiles

Daniel Urda; José Luis Subirats; Leonardo Franco; José M. Jerez

Gene expression profiling strategies have attracted considerable interest from biologist due to the potential for high throughput analysis of hundreds of thousands of gene transcripts. Methods using artifical neural networks (ANNs) were developed to identify an optimal subset of predictive gene transcripts from highly dimensional microarray data. The problematic of using a stepwise forward selection ANN method is that it needs many different parameters depending on the complexity of the problem and choosing the proper neural network architecture for a given classification problem is not a trivial problem. A novel constructive neural networks algorithm (CMantec) is applied in order to predict estrogen receptor status by using data from microarrays experiments. The obtained results show that CMantec model clearly outperforms the ANN model both in process execution time as in the final prognosis accuracy. Therefore, CMantec appears as a powerful tool to identify gene signatures that predict the ER status for a given patient.

international work-conference on artificial and natural neural networks | 2017

Deep Learning to Analyze RNA-Seq Gene Expression Data

Daniel Urda; Julio Montes-Torres; Fernando Moreno; Leonardo Franco; José M. Jerez

Deep learning models are currently being applied in several areas with great success. However, their application for the analysis of high-throughput sequencing data remains a challenge for the research community due to the fact that this family of models are known to work very well in big datasets with lots of samples available, just the opposite scenario typically found in biomedical areas. In this work, a first approximation on the use of deep learning for the analysis of RNA-Seq gene expression profiles data is provided. Three public cancer-related databases are analyzed using a regularized linear model (standard LASSO) as baseline model, and two deep learning models that differ on the feature selection technique used prior to the application of a deep neural net model. The results indicate that a straightforward application of deep nets implementations available in public scientific tools and under the conditions described within this work is not enough to outperform simpler models like LASSO. Therefore, smarter and more complex ways that incorporate prior biological knowledge into the estimation procedure of deep learning models may be necessary in order to obtain better results in terms of predictive performance.

PLOS ONE | 2016

Advanced Online Survival Analysis Tool for Predictive Modelling in Clinical Data Science.

Julio Montes-Torres; José Luis Subirats; Nuria Ribelles; Daniel Urda; Leonardo Franco; Emilio Alba; José M. Jerez

One of the prevailing applications of machine learning is the use of predictive modelling in clinical survival analysis. In this work, we present our view of the current situation of computer tools for survival analysis, stressing the need of transferring the latest results in the field of machine learning to biomedical researchers. We propose a web based software for survival analysis called OSA (Online Survival Analysis), which has been developed as an open access and user friendly option to obtain discrete time, predictive survival models at individual level using machine learning techniques, and to perform standard survival analysis. OSA employs an Artificial Neural Network (ANN) based method to produce the predictive survival models. Additionally, the software can easily generate survival and hazard curves with multiple options to personalise the plots, obtain contingency tables from the uploaded data to perform different tests, and fit a Cox regression model from a number of predictor variables. In the Materials and Methods section, we depict the general architecture of the application and introduce the mathematical background of each of the implemented methods. The study concludes with examples of use showing the results obtained with public datasets.

Computer Methods and Programs in Biomedicine | 2012

WIMP: Web server tool for missing data imputation

Daniel Urda; José Luis Subirats; Pedro J. García-Laencina; Leonardo Franco; José-Luis Sancho-Gómez; José M. Jerez

The imputation of unknown or missing data is a crucial task on the analysis of biomedical datasets. There are several situations where it is necessary to classify or identify instances given incomplete vectors, and the existence of missing values can much degrade the performance of the algorithms used for the classification/recognition. The task of learning accurately from incomplete data raises a number of issues some of which have not been completely solved in machine learning applications. In this sense, effective missing value estimation methods are required. Different methods for missing data imputations exist but most of the times the selection of the appropriate technique involves testing several methods, comparing them and choosing the right one. Furthermore, applying these methods, in most cases, is not straightforward, as they involve several technical details, and in particular in cases such as when dealing with microarray datasets, the application of the methods requires huge computational resources. As far as we know, there is not a public software application that can provide the computing capabilities required for carrying the task of data imputation. This paper presents a new public tool for missing data imputation that is attached to a computer cluster in order to execute high computational tasks. The software WIMP (Web IMPutation) is a public available web site where registered users can create, execute, analyze and store their simulations related to missing data imputation.

international conference on artificial neural networks | 2011

Hybrid (generalization-correlation) method for feature selection in high dimensional DNA microarray prediction problems

Yasel Couce; Leonardo Franco; Daniel Urda; José Luis Subirats; José M. Jerez

Microarray data analysis is attracting increasing attention in computer science because of the many applications of machine learning methods in prediction problems. The process typically involves a feature selection step, important in order to increase the accuracy and speed of the classifiers. This work analyzes the characteristics of the features selected by two wrapper methods, the first one based on artificial neural networks (ANN) and the second in a novel constructive neural network (CNN) algorithm, to later propose a hybrid model that combines the advantages of wrapper and filter methods. The results obtained in terms of the computational costs involved and the prediction accuracy reached show the feasibility of the hybrid model proposed here and indicate an interesting research line for the near future.

international conference on artificial neural networks | 2013

A constructive neural network to predict pitting corrosion status of stainless steel

Daniel Urda; Rafael Marcos Luque; Maria Jesus Jiménez; Ignacio J. Turias; Leonardo Franco; José M. Jerez

The main consequences of corrosion are the costs derived from both the maintenance tasks as from the public safety protection. In this sense, artificial intelligence models are used to determine pitting corrosion behaviour of stainless steel. This work presents the C-MANTEC constructive neural network algorithm as an automatic system to determine the status pitting corrosion of that alloy. Several classification techniques are compared with our proposal: Linear Discriminant Analysis, k-Nearest Neighbor, Multilayer Perceptron, Support Vector Machines and Naive Bayes. The results obtained show the robustness and higher performance of the C-MANTEC algorithm in comparison to the other artificial intelligence models, corroborating the utility of the constructive neural networks paradigm in the modelling pitting corrosion problem.

international conference on artificial neural networks | 2013

Committee C-mantec: a probabilistic constructive neural network

José Luis Subirats; Rafael Marcos Luque-Baena; Daniel Urda; Francisco Ortega-Zamorano; José M. Jerez; Leonardo Franco

C-Mantec is a recently introduced constructive algorithm that generates compact neural architectures with good generalization abilities. Nevertheless, it produces a discrete output value and this might be a drawback in certain situations. We propose in this work two approaches in order to obtain a continuous output network such as the output can be interpreted as the probability of a given pattern to belong to one of the output classes. The CC-Mantec approach utilizes a committee strategy and the results obtained both with the XOR Boolean function and with a set of benchmark functions shows the suitability of the approach, as an improvement over the standard C-Mantec algorithm is obtained in almost all cases.

international symposium on neural networks | 2017

Machine learning models to search relevant genetic signatures in clinical context

Daniel Urda; Rafael Marcos Luque-Baena; Leonardo Franco; José M. Jerez; Noelia Sánchez-Maroño

Clinicians are interested in the estimation of robust and relevant genetic signatures from gene sequencing data. Many machine learning approaches have been proposed trying to address well-known issues of this complex task (feature or gene selection, classification or model selection, and prediction assessment). Addressing this problem often requires a deep knowledge of these methods and some of them demand high computational resources that may not be affordable. In this paper, an exhaustive study that includes different types of feature selection methods and classifiers is presented, providing clinicians an useful insight of the most suitable methods for this purpose. Predictions assessment is performed using a bootstrap cross-validation strategy as an honest validation scheme. The results of this study for six benchmark datasets show that filter or embedded methods are preferred, in general, to wrapper methods according to their better statistical significant results, in terms of accuracy, and lower demand for computational resources.

Explore More