Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ramon Diaz-Uriarte is active.

Publication


Featured researches published by Ramon Diaz-Uriarte.


BMC Bioinformatics | 2006

Gene selection and classification of microarray data using random forest

Ramon Diaz-Uriarte; Sara Alvarez de Andrés

BackgroundSelection of relevant genes for sample classification is a common task in most gene expression studies, where researchers try to identify the smallest possible set of genes that can still achieve good predictive performance (for instance, for future use with diagnostic purposes in clinical practice). Many gene selection approaches use univariate (gene-by-gene) rankings of gene relevance and arbitrary thresholds to select the number of genes, can only be applied to two-class problems, and use gene selection ranking criteria unrelated to the classification algorithm. In contrast, random forest is a classification algorithm well suited for microarray data: it shows excellent performance even when most predictive variables are noise, can be used when the number of variables is much larger than the number of observations and in problems involving more than two classes, and returns measures of variable importance. Thus, it is important to understand the performance of random forest with microarray data and its possible use for gene selection.ResultsWe investigate the use of random forest for classification of microarray data (including multi-class problems) and propose a new method of gene selection in classification problems based on random forest. Using simulated and nine microarray data sets we show that random forest has comparable performance to other classification methods, including DLDA, KNN, and SVM, and that the new gene selection procedure yields very small sets of genes (often smaller than alternative methods) while preserving predictive accuracy.ConclusionBecause of its performance and features, random forest and gene selection using random forest should probably become part of the standard tool-box of methods for class prediction and gene selection with microarray data.


BMC Bioinformatics | 2007

GeneSrF and varSelRF: a web-based tool and R package for gene selection and classification using random forest.

Ramon Diaz-Uriarte

BackgroundMicroarray data are often used for patient classification and gene selection. An appropriate tool for end users and biomedical researchers should combine user friendliness with statistical rigor, including carefully avoiding selection biases and allowing analysis of multiple solutions, together with access to additional functional information of selected genes. Methodologically, such a tool would be of greater use if it incorporates state-of-the-art computational approaches and makes source code available.ResultsWe have developed GeneSrF, a web-based tool, and varSelRF, an R package, that implement, in the context of patient classification, a validated method for selecting very small sets of genes while preserving classification accuracy. Computation is parallelized, allowing to take advantage of multicore CPUs and clusters of workstations. Output includes bootstrapped estimates of prediction error rate, and assessments of the stability of the solutions. Clickable tables link to additional information for each gene (GO terms, PubMed citations, KEGG pathways), and output can be sent to PaLS for examination of PubMed references, GO terms, KEGG and and Reactome pathways characteristic of sets of genes selected for class prediction. The full source code is available, allowing to extend the software. The web-based application is available from http://genesrf2.bioinfo.cnio.es. All source code is available from Bioinformatics.org or The Launchpad. The R package is also available from CRAN.ConclusionvarSelRF and GeneSrF implement a validated method for gene selection including bootstrap estimates of classification error rate. They are valuable tools for applied biomedical researchers, specially for exploratory work with microarray data. Because of the underlying technology used (combination of parallelization with web-based application) they are also of methodological interest to bioinformaticians and biostatisticians.


Molecular & Cellular Proteomics | 2009

Identification of tumor-associated autoantigens for the diagnosis of colorectal cancer in serum using high density protein microarrays.

Ingrid Babel; Rodrigo Barderas; Ramon Diaz-Uriarte; Jorge Luis Martínez-Torrecuadrada; Marta Sanchez-Carbayo; J. Ignacio Casal

There is a mounting evidence of the existence of autoantibodies associated to cancer progression. Antibodies are the target of choice for serum screening because of their stability and suitability for sensitive immunoassays. By using commercial protein microarrays containing 8000 human proteins, we examined 20 sera from colorectal cancer (CRC) patients and healthy subjects to identify autoantibody patterns and associated antigens. Forty-three proteins were differentially recognized by tumoral and reference sera (p value <0.04) in the protein microarrays. Five immunoreactive antigens, PIM1, MAPKAPK3, STK4, SRC, and FGFR4, showed the highest prevalence in cancer samples, whereas ACVR2B was more abundant in normal sera. Three of them, PIM1, MAPKAPK3, and ACVR2B, were used for further validation. A significant increase in the expression level of these antigens on CRC cell lines and colonic mucosa was confirmed by immunoblotting and immunohistochemistry on tissue microarrays. A diagnostic ELISA based on the combination of MAPKAPK3 and ACVR2B proteins yielded specificity and sensitivity values of 73.9 and 83.3% (area under the curve, 0.85), respectively, for CRC discrimination after using an independent sample set containing 94 sera representative of different stages of progression and control subjects. In summary, these studies confirmed the presence of specific autoantibodies for CRC and revealed new individual markers of disease (PIM1, MAPKAPK3, and ACVR2B) with the potential to diagnose CRC with higher specificity and sensitivity than previously reported serum biomarkers.


Molecular & Cellular Proteomics | 2011

Identification of MST1/STK4 and SULF1 proteins as autoantibody targets for the diagnosis of colorectal cancer by using phage microarrays

Ingrid Babel; Rodrigo Barderas; Ramon Diaz-Uriarte; Victor Moreno; Adolfo Suárez; María Jesús Fernández-Aceñero; Ramon Salazar; Gabriel Capellá; J. Ignacio Casal

The characterization of the humoral response in cancer patients is becoming a practical alternative to improve early detection. We prepared phage microarrays containing colorectal cancer cDNA libraries to identify phage-expressed peptides recognized by tumor-specific autoantibodies from patient sera. From a total of 1536 printed phages, 128 gave statistically significant values to discriminate cancer patients from control samples. From this, 43 peptide sequences were unique following DNA sequencing. Six phages containing homologous sequences to STK4/MST1, SULF1, NHSL1, SREBF2, GRN, and GTF2I were selected to build up a predictor panel. A previous study with high-density protein microarrays had identified STK4/MST1 as a candidate biomarker. An independent collection of 153 serum samples (50 colorectal cancer sera and 103 reference samples, including healthy donors and sera from other related pathologies) was used as a validation set to study prediction capability. A combination of four phages and two recombinant proteins, corresponding to MST1 and SULF1, achieved an area under the curve of 0.86 to correctly discriminate cancer from healthy sera. Inclusion of sera from other different neoplasias did not change significantly this value. For early stages (A+B), the corrected area under the curve was 0.786. Moreover, we have demonstrated that MST1 and SULF1 proteins, homologous to phage-peptide sequences, can replace the original phages in the predictor panel, improving their diagnostic accuracy.


Journal of Proteomics | 2012

An optimized predictor panel for colorectal cancer diagnosis based on the combination of tumor-associated antigens obtained from protein and phage microarrays

Rodrigo Barderas; Ingrid Babel; Ramon Diaz-Uriarte; Victor Moreno; Adolfo Suárez; Félix Bonilla; Roi Villar-Vázquez; Gabriel Capellá; J. Ignacio Casal

Humoral response in cancer patients appears early in cancer progression and can be used for diagnosis, including early detection. By using human recombinant protein and T7 phage microarrays displaying colorectal cancer (CRC)-specific peptides, we previously selected 6 phages and 6 human recombinant proteins as tumor-associated antigens (TAAs) with high diagnostic value. After completing validation in biological samples, TAAs were classified according to their correlation, redundancy in reactivity patterns and multiplex diagnostic capabilities. For predictor model optimization, TAAs were reanalyzed with a new set of samples. A combination of three phages displaying peptides homologous to GRN, NHSL1 and SREBF2 and four proteins PIM1, MAPKAPK3, FGFR4 and ACVR2B, achieved an area under the curve (AUC) of 94%, with a sensitivity of 89.1% and specificity of 90.0%, to correctly predict the presence of cancer. For early colorectal cancer stages, the AUC was 90%, with a sensitivity of 88.2% and specificity of 82.6%. In summary, we have defined an optimized predictor panel, combining TAAs from different sources, with highly improved accuracy and diagnostic value for colorectal cancer. This article is part of a Special Issue entitled: Translational Proteomics.


BMC Bioinformatics | 2015

Identifying restrictions in the order of accumulation of mutations during tumor progression: effects of passengers, evolutionary models, and sampling

Ramon Diaz-Uriarte

BackgroundCancer progression is caused by the sequential accumulation of mutations, but not all orders of accumulation are equally likely. When the fixation of some mutations depends on the presence of previous ones, identifying restrictions in the order of accumulation of mutations can lead to the discovery of therapeutic targets and diagnostic markers. The purpose of this study is to conduct a comprehensive comparison of the performance of all available methods to identify these restrictions from cross-sectional data. I used simulated data sets (where the true restrictions are known) but, in contrast to previous work, I embedded restrictions within evolutionary models of tumor progression that included passengers (mutations not responsible for the development of cancer, known to be very common). This allowed me to assess, for the first time, the effects of having to filter out passengers, of sampling schemes (when, how, and how many samples), and of deviations from order restrictions.ResultsPoor choices of method, filtering, and sampling lead to large errors in all performance measures. Having to filter passengers lead to decreased performance, especially because true restrictions were missed. Overall, the best method for identifying order restrictions were Oncogenetic Trees, a fast and easy to use method that, although unable to recover dependencies of mutations on more than one mutation, showed good performance in most scenarios, superior to Conjunctive Bayesian Networks and Progression Networks. Single cell sampling provided no advantage, but sampling in the final stages of the disease vs. sampling at different stages had severe effects. Evolutionary model and deviations from order restrictions had major, and sometimes counterintuitive, interactions with other factors that affected performance.ConclusionsThis paper provides practical recommendations for using these methods with experimental data. It also identifies key areas of future methodological work and, in particular, it shows that it is both possible and necessary to embed assumptions about order restrictions and the nature of driver status within evolutionary models of cancer progression to evaluate the performance of inferential approaches.


International Journal of Cancer | 2015

Genome wide association study identifies a novel putative mammographic density locus at 1q12-q21.

Pablo Fernández-Navarro; Anna González-Neira; Guillermo Pita; Ramon Diaz-Uriarte; Leticia Tais Moreno; María Ederra; Carmen Pedraz-Pingarrón; Carmen Sánchez-Contador; Jose Antonio Vázquez-Carrete; Pilar Moreo; Carmen Vidal; Dolores Salas-Trejo; Jennifer Stone; Melissa C. Southey; John L. Hopper; Beatriz Pérez-Gómez; Javier Benitez; Marina Pollán

Mammographic density (MD) is an intermediate phenotype for breast cancer. Previous studies have identified genetic variants associated with MD; however, much of the genetic contribution to MD is unexplained. We conducted a two‐stage genome‐wide association analysis among the participants in the “Determinants of Density in Mammographies in Spain” study, together with a replication analysis in women from the Australian MD Twins and Sisters Study. Our discovery set covered a total of 3,351 Caucasian women aged 45 to 68 years, recruited from Spanish breast cancer screening centres. MD was blindly assessed by a single reader using Boyds scale. A two‐stage approach was employed, including a feature selection phase exploring 575,374 SNPs in 239 pairs of women with extreme phenotypes and a verification stage for the 183 selected SNPs in the remaining sample (2,873 women). Replication was conducted in 1,786 women aged 40 to 70 years old recruited via the Australian Twin Registry, where MD were measured using Cumulus‐3.0, assessing 14 SNPs with a p value <0.10 in stage 2. Finally, two genetic variants in high linkage disequilibrium with our best hit were studied using the whole Spanish sample. Evidence of association with MD was found for variant rs11205277 (ORu2009=u20090.74; 95% CIu2009=u20090.67–0.81; pu2009=u20091.33 × 10−10). In replication analysis, only a marginal association between this SNP and absolute dense area was found. There were also evidence of association between MD and SNPs in high linkage disequilibrium with rs11205277, rs11205303 in gene MTMR11 (ORu2009=u20090.73; 95% CIu2009=u20090.66–0.80; pu2009=u20092.64 × 10−11) and rs67807996 in gene OTUD7B (ORu2009=u20090.72; 95% CIu2009=u20090.66–0.80; pu2009=u20092.03 × 10−11). Our findings provide additional evidence on common genetic variations that may contribute to MD.


Bioinformatics | 2017

OncoSimulR: genetic simulation with arbitrary epistasis and mutator genes in asexual populations

Ramon Diaz-Uriarte

Summary: OncoSimulR implements forward‐time genetic simulations of biallelic loci in asexual populations with special focus on cancer progression. Fitness can be defined as an arbitrary function of genetic interactions between multiple genes or modules of genes, including epistasis, restrictions in the order of accumulation of mutations, and order effects. Mutation rates can differ among genes, and can be affected by (anti)mutator genes. Also available are sampling from simulations (including single‐cell sampling), plotting the genealogical relationships of clones and generating and plotting fitness landscapes. Availability and Implementation: Implemented in R and C ++, freely available from BioConductor for Linux, Mac and Windows under the GNU GPL license. Version 2.5.9 or higher available from: http://www.bioconductor.org/packages/devel/bioc/html/OncoSimulR.html. GitHub repository at: https://github.com/rdiaz02/OncoSimul Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Bioinformatics | 2014

ADaCGH2: parallelized analysis of (big) CNA data

Ramon Diaz-Uriarte

MOTIVATIONnStudies of genomic DNA copy number alteration can deal with datasets with several million probes and thousands of subjects. Analyzing these data with currently available software (e.g. as available from BioConductor) can be extremely slow and may not be feasible because of memory requirements.nnnRESULTSnWe have developed a BioConductor package, ADaCGH2, that parallelizes the main segmentation algorithms (using forking on multicore computers or parallelization via message passing interface, etc., in clusters of computers) and uses ff objects for reading and data storage. We show examples of data with 6 million probes per array; we can analyze data that would otherwise not fit in memory, and compared with the non-parallelized versions we can achieve speedups of 25-40 times on a 64-cores machine.nnnAVAILABILITY AND IMPLEMENTATIONnADaCGH2 is an R package available from BioConductor. Version 2.3.11 or higher is available from the development branch: http://www.bioconductor.org/packages/devel/bioc/html/ADaCGH2.html.


Journal of Statistical Computation and Simulation | 2013

A Bayesian HMM with random effects and an unknown number of states for DNA copy number analysis

Oscar M. Rueda; Cristina Rueda; Ramon Diaz-Uriarte

Hidden Markov models (HMMs) have been shown to be a flexible tool for modelling complex biological processes. However, choosing the number of hidden states remains an open question and the inclusion of random effects also deserves more research, as it is a recent addition to the fixed-effect HMM in many application fields. We present a Bayesian mixed HMM with an unknown number of hidden states and fixed covariates. The model is fitted using reversible-jump Markov chain Monte Carlo, avoiding the need to select the number of hidden states. We show through simulations that the estimations produced are more precise than those from a fixed-effect HMM and illustrate its practical application to the analysis of DNA copy number data, a field where HMMs are widely used.

Collaboration


Dive into the Ramon Diaz-Uriarte's collaboration.

Top Co-Authors

Avatar

Ingrid Babel

Spanish National Research Council

View shared research outputs
Top Co-Authors

Avatar

J. Ignacio Casal

Spanish National Research Council

View shared research outputs
Top Co-Authors

Avatar

Rodrigo Barderas

Complutense University of Madrid

View shared research outputs
Top Co-Authors

Avatar

Félix Bonilla

Autonomous University of Madrid

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Victor Moreno

Autonomous University of Madrid

View shared research outputs
Top Co-Authors

Avatar

Angel Zaballos

Spanish National Research Council

View shared research outputs
Top Co-Authors

Avatar

Carmen Vidal

University of Barcelona

View shared research outputs
Top Co-Authors

Avatar

Claudia Vasallo Vega

Spanish National Research Council

View shared research outputs
Top Co-Authors

Avatar

Cristina Rueda

University of Valladolid

View shared research outputs
Researchain Logo
Decentralizing Knowledge