Jorge M. Arevalillo
National University of Distance Education
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jorge M. Arevalillo.
Cancer Research | 2015
Angelo Gámez-Pozo; Julia Berges-Soria; Jorge M. Arevalillo; Paolo Nanni; Rocío López-Vacas; Hilario Navarro; Jonas Grossmann; Carlos A. Castaneda; Paloma Main; Mariana Díaz-Almirón; Enrique Espinosa; Eva Ciruelos; Juan Ángel Fresno Vara
Better knowledge of the biology of breast cancer has allowed the use of new targeted therapies, leading to improved outcome. High-throughput technologies allow deepening into the molecular architecture of breast cancer, integrating different levels of information, which is important if it helps in making clinical decisions. microRNA (miRNA) and protein expression profiles were obtained from 71 estrogen receptor-positive (ER(+)) and 25 triple-negative breast cancer (TNBC) samples. RNA and proteins obtained from formalin-fixed, paraffin-embedded tumors were analyzed by RT-qPCR and LC/MS-MS, respectively. We applied probabilistic graphical models representing complex biologic systems as networks, confirming that ER(+) and TNBC subtypes are distinct biologic entities. The integration of miRNA and protein expression data unravels molecular processes that can be related to differences in the genesis and clinical evolution of these types of breast cancer. Our results confirm that TNBC has a unique metabolic profile that may be exploited for therapeutic intervention.
Computers in Biology and Medicine | 2013
Jorge M. Arevalillo; Hilario Navarro
An important issue in the analysis of gene expression microarray data is concerned with the extraction of valuable genetic interactions from high dimensional data sets containing gene expression levels collected for a small sample of assays. Past and ongoing research efforts have been focused on biomarker selection for phenotype classification. Usually, many genes convey useless information for classifying the outcome and should be removed from the analysis; on the other hand, some of them may be highly correlated, which reveals the presence of redundant expressed information. In this paper we propose a method for the selection of highly predictive genes having a low redundancy in their expression levels. The predictive accuracy of the selection is assessed by means of Classification and Regression Trees (CART) models which enable assessment of the performance of the selected genes for classifying the outcome variable and will also uncover complex genetic interactions. The method is illustrated throughout the paper using a public domain colon cancer gene expression data set.
Journal of Multivariate Analysis | 2012
Jorge M. Arevalillo; Hilario Navarro
This paper is concerned with the role some parameters indexing four important families within the multivariate elliptically contoured distributions play as indicators of multivariate kurtosis. The problem is addressed for the exponential power family, for a subclass of the Kotz family and for the Pearson type II and type VII distributions. Once such a problem is analyzed, we study the effect these parameters have, as kurtosis indicators, on binary discriminant analysis by exploring their relationship with the error rate of the Bayes discriminant rule. The effect is analyzed under mild conditions on the kernel function generating the elliptical density. Some numerical examples are given in order to illustrate our theoretical insights and findings.
BMC Bioinformatics | 2011
Jorge M. Arevalillo; Hilario Navarro
BackgroundOne of the drawbacks we face up when analyzing gene to phenotype associations in genomic data is the ugly performance of the designed classifier due to the small sample-high dimensional data structures (n ≪ p) at hand. This is known as the peaking phenomenon, a common situation in the analysis of gene expression data. Highly predictive bivariate gene interactions whose marginals are useless for discrimination are also affected by such phenomenon, so they are commonly discarded by state of the art sequential search algorithms. Such patterns are known as weak/marginal strong bivariate interactions. This paper addresses the problem of uncovering them in high dimensional settings.ResultsWe propose a new approach which uses the quadratic discriminant analysis (QDA) as a search engine in order to detect such signals. The choice of QDA is justified by a simulation study for a benchmark of classifiers which reveals its appealing properties. The procedure rests on an exhaustive search which explores the feature space in a blockwise manner by dividing it in blocks and by assessing the accuracy of the QDA for the predictors within each pair of blocks; the block size is determined by the resistance of the QDA to peaking. This search highlights chunks of features which are expected to contain the type of subtle interactions we are concerned with; a closer look at this smaller subset of features by means of an exhaustive search guided by the QDA error rate for all the pairwise input combinations within this subset will enable their final detection. The proposed method is applied both to synthetic data and to a public domain microarray data. When applied to gene expression data, it leads to pairs of genes which are not univariate differentially expressed but exhibit subtle patterns of bivariate differential expression.ConclusionsWe have proposed a novel approach for identifying weak marginal/strong bivariate interactions. Unlike standard approaches as the top scoring pair (TSP) and the CorScor, our procedure does not assume a specified shape of phenotype separation and may enrich the type of bivariate differential expression patterns that can be uncovered in high dimensional data.
Scientific Reports | 2017
Angelo Gámez-Pozo; Lucia Trilla-Fuertes; Julia Berges-Soria; Nathalie Selevsek; Rocío López-Vacas; Mariana Díaz-Almirón; Paolo Nanni; Jorge M. Arevalillo; Hilario Navarro; Jonas Grossmann; Francisco Gayá Moreno; Rubén Gómez Rioja; Guillermo Prado-Vazquez; Andrea Zapater-Moros; Paloma Main; Jaime Feliu; Purificación Martínez del Prado; Pilar Zamora; Eva Ciruelos; Enrique Espinosa; Juan Ángel Fresno Vara
Breast cancer is a heterogeneous disease comprising a variety of entities with various genetic backgrounds. Estrogen receptor-positive, human epidermal growth factor receptor 2-negative tumors typically have a favorable outcome; however, some patients eventually relapse, which suggests some heterogeneity within this category. In the present study, we used proteomics and miRNA profiling techniques to characterize a set of 102 either estrogen receptor-positive (ER+)/progesterone receptor-positive (PR+) or triple-negative formalin-fixed, paraffin-embedded breast tumors. Protein expression-based probabilistic graphical models and flux balance analyses revealed that some ER+/PR+ samples had a protein expression profile similar to that of triple-negative samples and had a clinical outcome similar to those with triple-negative disease. This probabilistic graphical model-based classification had prognostic value in patients with luminal A breast cancer. This prognostic information was independent of that provided by standard genomic tests for breast cancer, such as MammaPrint, OncoType Dx and the 8-gene Score.
knowledge discovery and data mining | 2009
Jorge M. Arevalillo; Hilario Navarro
Random Forests (RF) is an ensemble method which has become widely accepted within the machine learning and bioinformatics communities in the last few years. Its predictive strength, along with some of the ingredients --- rich in information --- provided by the output, has made RF an efficient Data Mining tool for discovering patterns in data. In this paper we review the learning mechanism of RF within the classification setting and apply it to uncover bivariate interactions, carrying on useful information about an outcome, in high dimensional low sample data. We propose a divide and conquer search strategy in the variable space that benefits from the ranking of variable importances of RF at a first stage, along with the out of bag error rate (oob) of the ensemble at a second stage. The procedure combines both elements in order to capture difficult to uncover patterns in these type of data. We will show the performance of our procedure in some synthetic scenarios and will give a real application to a microarray data set in order to illustrate how it works.
Statistics & Probability Letters | 2003
Jorge M. Arevalillo
This paper is concerned with the inversion of a saddlepoint approximation for the tail probability of an asymptotically Normal statistic with cumulants expandable in powers of n-1/2. The inversion yields to an approximation for the quantile of the distribution of the statistic that is compared, both theoretically and numerically, with other well-known approximations, such as the normal one and the second-order Cornish-Fisher expansion.
Fundamenta Informaticae | 2011
Jorge M. Arevalillo; Hilario Navarro
Random Forests (RF) is an ensemble technology for classification and regression which has become widely accepted in the bioinformatics community in the last few years. Its predictive strength, along with some of the utilities, rich in information, provided by the output, has made RF an efficient data mining tool for discovering patterns in high dimensional data. In this paper we propose a search strategy that explores a subset of the input space in an exhaustive way using RF as the search engine. Our procedure begins by taking the variables previously rejected by a sequential search procedure and uses the out of bag error rate of the ensemble, obtained when trained over an augmented data set, as criterion to capture difficult to uncover bivariate patterns associated with an outcome variable. We will show the performance of the procedure in some synthetic scenarios and will give an application to a real microarray experiment in order to illustrate how it works for gene expression data.
bioRxiv | 2018
Lucia Trilla-Fuertes; Andrea Zapater-Moros; Angelo Gámez-Pozo; Jorge M. Arevalillo; Guillermo Prado-Vazquez; Mariana Díaz-Almirón; Maria Ferrer-Gomez; Rocío López-Vacas; Hilario Navarro; Enrique Espinosa; Paloma Main; Juan Ángel Fresno Vara
Breast cancer is a heterogeneous disease. In clinical practice, tumors are classified as hormonal receptor positive, Her2 positive and triple negative tumors. In previous works, our group defined a new hormonal receptor positive subgroup, the TN-like subtype, which has a prognosis and a molecular profile more similar to triple negative tumors. In this study, proteomics and Bayesian networks were used to characterize protein relationships in 106 breast tumor samples. Components obtained by these methods had a clear functional structure. The analysis of these components suggested differences in processes such as metastasis or proliferation between breast cancer subtypes, including our new subtype TN-like. In addition, one of the components, mainly related with metastasis, had prognostic value in this cohort. Functional approaches allow to build hypotheses about regulatory mechanisms and to establish new relationships among proteins in the breast cancer context. Author Summary Breast cancer classification in the clinical practice is defined by three biomarkers (estrogen receptor, progesterone receptor and HER2) into hormone receptor positive, HER2+ and triple negative breast cancer (TNBC). Our group recently described a new ER+ subtype with molecular characteristics and prognosis similar to TNBC. In this study we propose a mathematical method, the Bayesian networks, as a useful tool to study protein interactions and differential biological processes in breast cancer subtypes, characterizing differences in relevant processes such as proliferation or metastasis and associated them with patient prognosis.
bioRxiv | 2018
Lucia Trilla-Fuertes; Angelo Gámez-Pozo; Jorge M. Arevalillo; Guillermo Prado-Vazquez; Andrea Zapater-Moros; Mariana Díaz-Almirón; Hilario Navarro; Paloma Main; Enrique Espinosa; Pilar Zamora; Juan Ángel Fresno Vara
Abstract Metabolomics has a great potential in the development of new biomarkers in cancer. In this study, metabolomics and gene expression data from breast cancer tumor samples were analyzed, using (1) probabilistic graphical models to define associations using quantitative data without other a priori information; and (2) Flux Balance Analysis and flux activities to characterize differences in metabolic pathways. On the one hand, both analyses highlighted the importance of glutamine in breast cancer. Moreover, cell experiments showed that treating breast cancer cells with drugs targeting glutamine metabolism significantly affects cell viability. On the other hand, these computational methods suggested some hypotheses and have demonstrated their utility in the analysis of metabolomics data and in associating metabolomics with patient’s clinical outcome.Metabolomics has great potential in the development of new biomarkers in cancer. In this study, metabolomics and gene expression data from breast cancer tumor samples were analyzed, using (1) probabilistic graphical models to define associations using quantitative data without other a priori information; and (2) Flux Balance Analysis and flux activities to characterize differences in metabolic pathways. A metabolite network was built through the use of probabilistic graphical models. Interestingly, the metabolites were organized into metabolic pathways in this network, thus it was possible to establish differences between breast cancer subtypes at the metabolic pathway level. Additionally, the lipid metabolism node had prognostic value. A second network associating gene expression with metabolites was built. Associations were established between the biological functions of genes and the metabolites included in each node. A third network combined flux activities from Flux Balance Analysis and metabolomics data, showing coherence between the metabolic pathways of the flux activities and the metabolites in each branch. In this study, probabilistic graphical models were valuable for the functional analysis of metabolomics data from a functional point of view, allowing new hypotheses in metabolomics and associating metabolomics data with the patient’s clinical outcome. Author summary Metabolomics is a promising technique to describe new biomarkers in cancer. In this study we proposed computational methods to manage this type of data and associate it with gene expression data. We also employed a metabolic computational model to compare predictions from this model with metabolomics measurements. Finally, we built predictors of relapse based on the integration of those high-dimensional data in breast cancer patients.