Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Borja Calvo is active.

Publication


Featured researches published by Borja Calvo.


PLOS ONE | 2009

Differential Micro RNA Expression in PBMC from Multiple Sclerosis Patients

David Otaegui; Sergio E. Baranzini; Rubén Armañanzas; Borja Calvo; Maider Muñoz-Culla; Puya Khankhanian; Iñaki Inza; José Antonio Lozano; Tamara Castillo-Triviño; Ana Asensio; Javier Olaskoaga; Adolfo López de Munain

Differences in gene expression patterns have been documented not only in Multiple Sclerosis patients versus healthy controls but also in the relapse of the disease. Recently a new gene expression modulator has been identified: the microRNA or miRNA. The aim of this work is to analyze the possible role of miRNAs in multiple sclerosis, focusing on the relapse stage. We have analyzed the expression patterns of 364 miRNAs in PBMC obtained from multiple sclerosis patients in relapse status, in remission status and healthy controls. The expression patterns of the miRNAs with significantly different expression were validated in an independent set of samples. In order to determine the effect of the miRNAs, the expression of some predicted target genes of these were studied by qPCR. Gene interaction networks were constructed in order to obtain a co-expression and multivariate view of the experimental data. The data analysis and later validation reveal that two miRNAs (hsa-miR-18b and hsa-miR-599) may be relevant at the time of relapse and that another miRNA (hsa-miR-96) may be involved in remission. The genes targeted by hsa-miR-96 are involved in immunological pathways as Interleukin signaling and in other pathways as wnt signaling. This work highlights the importance of miRNA expression in the molecular mechanisms implicated in the disease. Moreover, the proposed involvement of these small molecules in multiple sclerosis opens up a new therapeutic approach to explore and highlight some candidate biomarker targets in MS.


Pattern Recognition Letters | 2007

Learning Bayesian classifiers from positive and unlabeled examples

Borja Calvo; Pedro Larrañaga; José Antonio Lozano

The positive unlabeled learning term refers to the binary classification problem in the absence of negative examples. When only positive and unlabeled instances are available, semi-supervised classification algorithms cannot be directly applied, and thus new algorithms are required. One of these positive unlabeled learning algorithms is the positive naive Bayes (PNB), which is an adaptation of the naive Bayes induction algorithm that does not require negative instances. In this work we propose two ways of enhancing this algorithm. On one hand, we have taken the concept behind PNB one step further, proposing a procedure to build more complex Bayesian classifiers in the absence of negative instances. We present a new algorithm (named positive tree augmented naive Bayes, PTAN) to obtain tree augmented naive Bayes models in the positive unlabeled domain. On the other hand, we propose a new Bayesian approach to deal with the a priori probability of the positive class that models the uncertainty over this parameter by means of a Beta distribution. This approach is applied to both PNB and PTAN, resulting in two new algorithms. The four algorithms are empirically compared in positive unlabeled learning problems based on real and synthetic databases. The results obtained in these comparisons suggest that, when the predicting variables are not conditionally independent given the class, the extension of PNB to more complex networks increases the classification performance. They also show that our Bayesian approach to the a priori probability of the positive class can improve the results obtained by PNB and PTAN.


Methods of Molecular Biology | 2010

Machine Learning: An Indispensable Tool in Bioinformatics

Iñaki Inza; Borja Calvo; Rubén Armañanzas; Endika Bengoetxea; Pedro Larrañaga; José Antonio Lozano

The increase in the number and complexity of biological databases has raised the need for modern and powerful data analysis tools and techniques. In order to fulfill these requirements, the machine learning discipline has become an everyday tool in bio-laboratories. The use of machine learning techniques has been extended to a wide spectrum of bioinformatics applications. It is broadly used to investigate the underlying mechanisms and interactions between biological molecules in many diseases, and it is an essential tool in any biomarker discovery process. In this chapter, we provide a basic taxonomy of machine learning algorithms, and the characteristics of main data preprocessing, supervised classification, and clustering techniques are shown. Feature selection, classifier evaluation, and two supervised classification topics that have a deep impact on current bioinformatics are presented. We make the interested reader aware of a set of popular web resources, open source software tools, and benchmarking data repositories that are frequently used by the machine learning community.


Nucleic Acids Research | 2008

Prioritization of candidate cancer genes—an aid to oncogenomic studies

Simon J. Furney; Borja Calvo; Pedro Larrañaga; José Antonio Lozano; Nuria Lopez-Bigas

The development of techniques for oncogenomic analyses such as array comparative genomic hybridization, messenger RNA expression arrays and mutational screens have come to the fore in modern cancer research. Studies utilizing these techniques are able to highlight panels of genes that are altered in cancer. However, these candidate cancer genes must then be scrutinized to reveal whether they contribute to oncogenesis or are coincidental and non-causative. We present a computational method for the prioritization of candidate (i) proto-oncogenes and (ii) tumour suppressor genes from oncogenomic experiments. We constructed computational classifiers using different combinations of sequence and functional data including sequence conservation, protein domains and interactions, and regulatory data. We found that these classifiers are able to distinguish between known cancer genes and other human genes. Furthermore, the classifiers also discriminate candidate cancer genes from a recent mutational screen from other human genes. We provide a web-based facility through which cancer biologists may access our results and we propose computational cancer gene classification as a useful method of prioritizing candidate cancer genes identified in oncogenomic studies.


Computer Methods and Programs in Biomedicine | 2013

A new measure for gene expression biclustering based on non-parametric correlation

Jose Luis Flores; Iñaki Inza; Pedro Larrañaga; Borja Calvo

BACKGROUND One of the emerging techniques for performing the analysis of the DNA microarray data known as biclustering is the search of subsets of genes and conditions which are coherently expressed. These subgroups provide clues about the main biological processes. Until now, different approaches to this problem have been proposed. Most of them use the mean squared residue as quality measure but relevant and interesting patterns can not be detected such as shifting, or scaling patterns. Furthermore, recent papers show that there exist new coherence patterns involved in different kinds of cancer and tumors such as inverse relationships between genes which can not be captured. RESULTS The proposed measure is called Spearmans biclustering measure (SBM) which performs an estimation of the quality of a bicluster based on the non-linear correlation among genes and conditions simultaneously. The search of biclusters is performed by using a evolutionary technique called estimation of distribution algorithms which uses the SBM measure as fitness function. This approach has been examined from different points of view by using artificial and real microarrays. The assessment process has involved the use of quality indexes, a set of bicluster patterns of reference including new patterns and a set of statistical tests. It has been also examined the performance using real microarrays and comparing to different algorithmic approaches such as Bimax, CC, OPSM, Plaid and xMotifs. CONCLUSIONS SBM shows several advantages such as the ability to recognize more complex coherence patterns such as shifting, scaling and inversion and the capability to selectively marginalize genes and conditions depending on the statistical significance.


Computer Methods and Programs in Biomedicine | 2007

A partially supervised classification approach to dominant and recessive human disease gene prediction

Borja Calvo; Nuria Lopez-Bigas; Simon J. Furney; Pedro Larrañaga; José Antonio Lozano

The discovery of the genes involved in genetic diseases is a very important step towards the understanding of the nature of these diseases. In-lab identification is a difficult, time-consuming task, where computational methods can be very useful. In silico identification algorithms can be used as a guide in future studies. Previous works in this topic have not taken into account that no reliable sets of negative examples are available, as it is not possible to ensure that a given gene is not related to any genetic disease. In this paper, this feature of the nature of the problem is considered, and identification is approached as a partially supervised classification problem. In addition, we have performed a more specific method to identify disease genes by classifying, for the first time, genes causing dominant and recessive diseases independently. We base this separation on previous results that show that these two types of genes present differences in their sequence properties. In this paper, we have applied a new model averaging algorithm to the identification of human genes associated with both dominant and recessive Mendelian diseases.


international conference of the ieee engineering in medicine and biology society | 2009

Microarray Analysis of Autoimmune Diseases by Machine Learning Procedures

Rubén Armañanzas; Borja Calvo; Iñaki Inza; Marcos López-Hoyos; Víctor Manuel Martínez-Taboada; Eduardo Úcar; Irantzu Bernales; Asier Fullaondo; Pedro Larrañaga; Ana M. Zubiaga

Microarray-based global gene expression profiling, with the use of sophisticated statistical algorithms is providing new insights into the pathogenesis of autoimmune diseases. We have applied a novel statistical technique for gene selection based on machine learning approaches to analyze microarray expression data gathered from patients with systemic lupus erythematosus (SLE) and primary antiphospholipid syndrome (PAPS), two autoimmune diseases of unknown genetic origin that share many common features. The methodology included a combination of three data discretization policies, a consensus gene selection method, and a multivariate correlation measurement. A set of 150 genes was found to discriminate SLE and PAPS patients from healthy individuals. Statistical validations demonstrate the relevance of this gene set from an univariate and multivariate perspective. Moreover, functional characterization of these genes identified an interferon-regulated gene signature, consistent with previous reports. It also revealed the existence of other regulatory pathways, including those regulated by PTEN, TNF, and BCL-2, which are altered in SLE and PAPS. Remarkably, a significant number of these genes carry E2F binding motifs in their promoters, projecting a role for E2F in the regulation of autoimmunity.


Pattern Recognition Letters | 2009

Feature subset selection from positive and unlabelled examples

Borja Calvo; Pedro Larrañaga; José Antonio Lozano

The feature subset selection problem has a growing importance in many machine learning applications where the amount of variables is very high. There is a great number of algorithms that can approach this problem in supervised databases but, when examples from one or more classes are not available, supervised feature subset selection algorithms cannot be directly applied. One of these algorithms is the correlation based filter selection (CFS). In this work we propose an adaptation of this algorithm that can be applied when only positive and unlabelled examples are available. As far as we know, this is the first time the feature subset selection problem is studied in the positive unlabelled learning context. We have tested this adaptation on synthetic datasets obtained by sampling Bayesian network models where we know which variables are (in)dependent of the class. We have also tested our adaptations on real-life databases where the absence of negative examples has been simulated. The results show that, having enough positive examples, it is possible to obtain good solutions to the feature subset selection problem when only positive and unlabelled instances are available.


european conference on machine learning | 2014

Statistical hypothesis testing in positive unlabelled data

Konstantinos Sechidis; Borja Calvo; Gavin Brown

We propose a set of novel methodologies which enable valid statistical hypothesis testing when we have only positive and unlabelled (PU) examples. This type of problem, a special case of semi-supervised data, is common in text mining, bioinformatics, and computer vision. Focusing on a generalised likelihood ratio test, we have 3 key contributions: (1) a proof that assuming all unlabelled examples are negative cases is sufficient for independence testing, but not for power analysis activities; (2) a new methodology that compensates this and enables power analysis, allowing sample size determination for observing an effect with a desired power; and finally, (3) a new capability, supervision determination, which can determine a-priori the number of labelled examples the user must collect before being able to observe a desired statistical effect. Beyond general hypothesis testing, we suggest the tools will additionally be useful for information theoretic feature selection, and Bayesian Network structure learning.


IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2011

A Preprocessing Procedure for Haplotype Inference by Pure Parsimony

Ekhine Irurozki; Borja Calvo; José Antonio Lozano

Haplotype data are especially important in the study of complex diseases since it contains more information than genotype data. However, obtaining haplotype data is technically difficult and costly. Computational methods have proved to be an effective way of inferring haplotype data from genotype data. One of these methods, the haplotype inference by pure parsimony approach (HIPP), casts the problem as an optimization problem and as such has been proved to be NP-hard. We have designed and developed a new preprocessing procedure for this problem. Our proposed algorithm works with groups of haplotypes rather than individual haplotypes. It iterates searching and deleting haplotypes that are not helpful in order to find the optimal solution. This preprocess can be coupled with any of the current solvers for the HIPP that need to preprocess the genotype data. In order to test it, we have used two state-of-the-art solvers, RTIP and GAHAP, and simulated and real HapMap data. Due to the computational time and memory reduction caused by our preprocess, problem instances that were previously unaffordable can be now efficiently solved.

Collaboration


Dive into the Borja Calvo's collaboration.

Top Co-Authors

Avatar

José Antonio Lozano

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar

Pedro Larrañaga

Technical University of Madrid

View shared research outputs
Top Co-Authors

Avatar

Iñaki Inza

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar

Josu Ceberio

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar

Rubén Armañanzas

Technical University of Madrid

View shared research outputs
Top Co-Authors

Avatar

Christian Blum

Spanish National Research Council

View shared research outputs
Top Co-Authors

Avatar

Ekhine Irurozki

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar

Alexander Mendiburu

University of the Basque Country

View shared research outputs
Top Co-Authors

Avatar

Maria J. Blesa

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar

Naiara G. Bediaga

University of the Basque Country

View shared research outputs
Researchain Logo
Decentralizing Knowledge