Juan Caballero
Barcelona Biomedical Research Park
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Juan Caballero.
Science | 2014
Richard E. Green; Edward L. Braun; Joel Armstrong; Dent Earl; Ngan Nguyen; Glenn Hickey; Michael W. Vandewege; John St. John; Salvador Capella-Gutiérrez; Todd A. Castoe; Colin Kern; Matthew K. Fujita; Juan C. Opazo; Jerzy Jurka; Kenji K. Kojima; Juan Caballero; Robert Hubley; Arian Smit; Roy N. Platt; Christine Lavoie; Meganathan P. Ramakodi; John W. Finger; Alexander Suh; Sally R. Isberg; Lee G. Miles; Amanda Y. Chong; Weerachai Jaratlerdsiri; Jaime Gongora; C. Moran; Andrés Iriarte
INTRODUCTION Crocodilians and birds are the two extant clades of archosaurs, a group that includes the extinct dinosaurs and pterosaurs. Fossils suggest that living crocodilians (alligators, crocodiles, and gharials) have a most recent common ancestor 80 to 100 million years ago. Extant crocodilians are notable for their distinct morphology, limited intraspecific variation, and slow karyotype evolution. Despite their unique biology and phylogenetic position, little is known about genome evolution within crocodilians. Evolutionary rates of tetrapods inferred from DNA sequences anchored by ultraconserved elements. Evolutionary rates among reptiles vary, with especially low rates among extant crocodilians but high rates among squamates. We have reconstructed the genomes of the common ancestor of birds and of all archosaurs (shown in gray silhouette, although the morphology of these species is uncertain). RATIONALE Genome sequences for the American alligator, saltwater crocodile, and Indian gharial—representatives of all three extant crocodilian families—were obtained to facilitate better understanding of the unique biology of this group and provide a context for studying avian genome evolution. Sequence data from these three crocodilians and birds also allow reconstruction of the ancestral archosaurian genome. RESULTS We sequenced shotgun genomic libraries from each species and used a variety of assembly strategies to obtain draft genomes for these three crocodilians. The assembled scaffold N50 was highest for the alligator (508 kilobases). Using a panel of reptile genome sequences, we generated phylogenies that confirm the sister relationship between crocodiles and gharials, the relationship with birds as members of extant Archosauria, and the outgroup status of turtles relative to birds and crocodilians. We also estimated evolutionary rates along branches of the tetrapod phylogeny using two approaches: ultraconserved element–anchored sequences and fourfold degenerate sites within stringently filtered orthologous gene alignments. Both analyses indicate that the rates of base substitution along the crocodilian and turtle lineages are extremely low. Supporting observations were made for transposable element content and for gene family evolution. Analysis of whole-genome alignments across a panel of reptiles and mammals showed that the rate of accumulation of micro-insertions and microdeletions is proportionally lower in crocodilians, consistent with a single underlying cause of a reduced rate of evolutionary change rather than intrinsic differences in base repair machinery. We hypothesize that this single cause may be a consistently longer generation time over the evolutionary history of Crocodylia. Low heterozygosity was observed in each genome, consistent with previous analyses, including the Chinese alligator. Pairwise sequential Markov chain analysis of regional heterozygosity indicates that during glacial cycles of the Pleistocene, each species suffered reductions in effective population size. The reduction was especially strong for the American alligator, whose current range extends farthest into regions of temperate climates. CONCLUSION We used crocodilian, avian, and outgroup genomes to reconstruct 584 megabases of the archosaurian common ancestor genome and the genomes of key ancestral nodes. The estimated accuracy of the archosaurian genome reconstruction is 91% and is higher for conserved regions such as genes. The reconstructed genome can be improved by adding more crocodilian and avian genome assemblies and may provide a unique window to the genomes of extinct organisms such as dinosaurs and pterosaurs. To provide context for the diversification of archosaurs—the group that includes crocodilians, dinosaurs, and birds—we generated draft genomes of three crocodilians: Alligator mississippiensis (the American alligator), Crocodylus porosus (the saltwater crocodile), and Gavialis gangeticus (the Indian gharial). We observed an exceptionally slow rate of genome evolution within crocodilians at all levels, including nucleotide substitutions, indels, transposable element content and movement, gene family evolution, and chromosomal synteny. When placed within the context of related taxa including birds and turtles, this suggests that the common ancestor of all of these taxa also exhibited slow genome evolution and that the comparatively rapid evolution is derived in birds. The data also provided the opportunity to analyze heterozygosity in crocodilians, which indicates a likely reduction in population size for all three taxa through the Pleistocene. Finally, these data combined with newly published bird genomes allowed us to reconstruct the partial genome of the common ancestor of archosaurs, thereby providing a tool to investigate the genetic starting material of crocodilians, birds, and dinosaurs.
Proceedings of the National Academy of Sciences of the United States of America | 2013
Younhee Ko; Seth A. Ament; James A. Eddy; Juan Caballero; John C. Earls; Leroy Hood; Nathan D. Price
To characterize gene expression patterns in the regional subdivisions of the mammalian brain, we integrated spatial gene expression patterns from the Allen Brain Atlas for the adult mouse with panels of cell type-specific genes for neurons, astrocytes, and oligodendrocytes from previously published transcriptome profiling experiments. We found that the combined spatial expression patterns of 170 neuron-specific transcripts revealed strikingly clear and symmetrical signatures for most of the brain’s major subdivisions. Moreover, the brain expression spatial signatures correspond to anatomical structures and may even reflect developmental ontogeny. Spatial expression profiles of astrocyte- and oligodendrocyte-specific genes also revealed regional differences; these defined fewer regions and were less distinct but still symmetrical in the coronal plane. Follow-up analysis suggested that region-based clustering of neuron-specific genes was related to (i) a combination of individual genes with restricted expression patterns, (ii) region-specific differences in the relative expression of functional groups of genes, and (iii) regional differences in neuronal density. Products from some of these neuron-specific genes are present in peripheral blood, raising the possibility that they could reflect the activities of disease- or injury-perturbed networks and collectively function as biomarkers for clinical disease diagnostics.
PLOS Genetics | 2014
Hong Li; Gustavo Glusman; Hao Hu; Shankaracharya; Juan Caballero; Robert Hubley; David J. Witherspoon; Stephen L. Guthery; Denise E. Mauldin; Lynn B. Jorde; Leroy Hood; Jared C. Roach; Chad D. Huff
The determination of the relationship between a pair of individuals is a fundamental application of genetics. Previously, we and others have demonstrated that identity-by-descent (IBD) information generated from high-density single-nucleotide polymorphism (SNP) data can greatly improve the power and accuracy of genetic relationship detection. Whole-genome sequencing (WGS) marks the final step in increasing genetic marker density by assaying all single-nucleotide variants (SNVs), and thus has the potential to further improve relationship detection by enabling more accurate detection of IBD segments and more precise resolution of IBD segment boundaries. However, WGS introduces new complexities that must be addressed in order to achieve these improvements in relationship detection. To evaluate these complexities, we estimated genetic relationships from WGS data for 1490 known pairwise relationships among 258 individuals in 30 families along with 46 population samples as controls. We identified several genomic regions with excess pairwise IBD in both the pedigree and control datasets using three established IBD methods: GERMLINE, fastIBD, and ISCA. These spurious IBD segments produced a 10-fold increase in the rate of detected false-positive relationships among controls compared to high-density microarray datasets. To address this issue, we developed a new method to identify and mask genomic regions with excess IBD. This method, implemented in ERSA 2.0, fully resolved the inflated cryptic relationship detection rates while improving relationship estimation accuracy. ERSA 2.0 detected all 1st through 6th degree relationships, and 55% of 9th through 11th degree relationships in the 30 families. We estimate that WGS data provides a 5% to 15% increase in relationship detection power relative to high-density microarray data for distant relationships. Our results identify regions of the genome that are highly problematic for IBD mapping and introduce new software to accurately detect 1st through 9th degree relationships from whole-genome sequence data.
Genome Biology and Evolution | 2015
Alexander Suh; Gennady Churakov; Meganathan P. Ramakodi; Roy N. Platt; Jerzy Jurka; Kenji K. Kojima; Juan Caballero; Arian Smit; Kent A. Vliet; Federico G. Hoffmann; Juergen Brosius; Richard E. Green; Edward L. Braun; David A. Ray; Juergen Schmitz
Chicken repeat 1 (CR1) retroposons are long interspersed elements (LINEs) that are ubiquitous within amniote genomes and constitute the most abundant family of transposed elements in birds, crocodilians, turtles, and snakes. They are also present in mammalian genomes, where they reside as numerous relics of ancient retroposition events. Yet, despite their relevance for understanding amniote genome evolution, the diversity and evolution of CR1 elements has never been studied on an amniote-wide level. We reconstruct the temporal and quantitative activity of CR1 subfamilies via presence/absence analyses across crocodilian phylogeny and comparative analyses of 12 crocodilian genomes, revealing relative genomic stasis of retroposition during genome evolution of extant Crocodylia. Our large-scale phylogenetic analysis of amniote CR1 subfamilies suggests the presence of at least seven ancient CR1 lineages in the amniote ancestor; and amniote-wide analyses of CR1 successions and quantities reveal differential retention (presence of ancient relics or recent activity) of these CR1 lineages across amniote genome evolution. Interestingly, birds and lepidosaurs retained the fewest ancient CR1 lineages among amniotes and also exhibit smaller genome sizes. Our study is the first to analyze CR1 evolution in a genome-wide and amniote-wide context and the data strongly suggest that the ancestral amniote genome contained myriad CR1 elements from multiple ancient lineages, and remnants of these are still detectable in the relatively stable genomes of crocodilians and turtles. Early mammalian genome evolution was thus characterized by a drastic shift from CR1 prevalence to dominance and hyperactivity of L2 LINEs in monotremes and L1 LINEs in therians.
PLOS ONE | 2012
Juan Caballero; Ana Garzón; Leticia González-Cintado; Wioleta Kowalczyk; Ignacio Torres; Gloria Calderita; Margarita Rodriguez; Virgínia Gondar; Juan Bernal; Carlos Ardavín; David Andreu; Thomas Zurcher; Cayetano von Kobbe
Cervical cancer is caused by persistent high-risk human papillomavirus (HR-HPV) infection and represents the second most frequent gynecological malignancy in the world. The HPV-16 type accounts for up to 55% of all cervical cancers. The HPV-16 oncoproteins E6 and E7 are necessary for induction and maintenance of malignant transformation and represent tumor-specific antigens for targeted cytotoxic T lymphocyte–mediated immunotherapy. Therapeutic cancer vaccines have become a challenging area of oncology research in recent decades. Among current cancer immunotherapy strategies, virus-like particle (VLP)–based vaccines have emerged as a potent and safe approach. We generated a vaccine (VLP-E7) incorporating a long C-terminal fragment of HPV-16 E7 protein into the infectious bursal disease virus VLP and tested its therapeutic potential in HLA-A2 humanized transgenic mice grafted with TC1/A2 tumor cells. We performed a series of tumor challenge experiments demonstrating a strong immune response against already-formed tumors (complete eradication). Remarkably, therapeutic efficacy was obtained with a single dose without adjuvant and against two injections of tumor cells, indicating a potent and long-lasting immune response.
PLOS ONE | 2014
Hong Li; Gustavo Glusman; Chad D. Huff; Juan Caballero; Jared C. Roach
Computing the genetic relationship between two humans is important to studies in genetics, genomics, genealogy, and forensics. Relationship algorithms may be sensitive to noise, such as that arising from sequencing errors or imperfect reference genomes. We developed an algorithm for estimation of genetic relationship by averaged blocks (GRAB) that is designed for whole-genome sequencing (WGS) data. GRAB segments the genome into blocks, calculates the fraction of blocks sharing identity, and then uses a classification tree to infer 1st- to 5th- degree relationships and unrelated individuals. We evaluated GRAB on simulated and real sequenced families, and compared it with other software. GRAB achieves similar performance, and does not require knowledge of population background or phasing. GRAB can be used in workflows for identifying unreported relationships, validating reported relationships in family-based studies, and detection of sample-tracking errors or duplicate inclusion. The software is available at familygenomics.systemsbiology.net/grab.
Nucleic Acids Research | 2014
Juan Caballero; Arian Smit; Leroy Hood; Gustavo Glusman
A common practice in computational genomic analysis is to use a set of ‘background’ sequences as negative controls for evaluating the false-positive rates of prediction tools, such as gene identification programs and algorithms for detection of cis-regulatory elements. Such ‘background’ sequences are generally taken from regions of the genome presumed to be intergenic, or generated synthetically by ‘shuffling’ real sequences. This last method can lead to underestimation of false-positive rates. We developed a new method for generating artificial sequences that are modeled after real intergenic sequences in terms of composition, complexity and interspersed repeat content. These artificial sequences can serve as an inexhaustible source of high-quality negative controls. We used artificial sequences to evaluate the false-positive rates of a set of programs for detecting interspersed repeats, ab initio prediction of coding genes, transcribed regions and non-coding genes. We found that RepeatMasker is more accurate than PClouds, Augustus has the lowest false-positive rate of the coding gene prediction programs tested, and Infernal has a low false-positive rate for non-coding gene detection. A web service, source code and the models for human and many other species are freely available at http://repeatmasker.org/garlic/.
PLOS ONE | 2013
Gustavo Glusman; Juan Caballero; Max Robinson; Burak Kutlu; Leroy Hood
Deep sequencing of transcriptomes has become an indispensable tool for biology, enabling expression levels for thousands of genes to be compared across multiple samples. Since transcript counts scale with sequencing depth, counts from different samples must be normalized to a common scale prior to comparison. We analyzed fifteen existing and novel algorithms for normalizing transcript counts, and evaluated the effectiveness of the resulting normalizations. For this purpose we defined two novel and mutually independent metrics: (1) the number of “uniform” genes (genes whose normalized expression levels have a sufficiently low coefficient of variation), and (2) low Spearman correlation between normalized expression profiles of gene pairs. We also define four novel algorithms, one of which explicitly maximizes the number of uniform genes, and compared the performance of all fifteen algorithms. The two most commonly used methods (scaling to a fixed total value, or equalizing the expression of certain ‘housekeeping’ genes) yielded particularly poor results, surpassed even by normalization based on randomly selected gene sets. Conversely, seven of the algorithms approached what appears to be optimal normalization. Three of these algorithms rely on the identification of “ubiquitous” genes: genes expressed in all the samples studied, but never at very high or very low levels. We demonstrate that these include a “core” of genes expressed in many tissues in a mutually consistent pattern, which is suitable for use as an internal normalization guide. The new methods yield robustly normalized expression values, which is a prerequisite for the identification of differentially expressed and tissue-specific genes as potential biomarkers.
Cell systems | 2017
Dhimankrishna Ghosh; Cory C. Funk; Juan Caballero; Nameeta Shah; Katherine Rouleau; John C. Earls; Liliana Soroceanu; Greg Foltz; Charles S. Cobbs; Nathan D. Price; Leroy Hood
We present a systems strategy that facilitated the development of a molecular signature for glioblastoma (GBM), composed of 33 cell-surface transmembrane proteins. This molecular signature, GBMSig, was developed through the integration of cell-surface proteomics and transcriptomics from patient tumors in the REMBRANDT (nxa0= 228) and TCGA datasets (nxa0=xa0547) and can separate GBM patients from control individuals with a Matthews correlation coefficient value of 0.87 in a lock-down test. Functionally, 17/33 GBMSig proteins are associated with transforming growth factor β signaling pathways, including CD47, SLC16A1, HMOX1, and MRC2. Knockdown of these genes impaired GBM invasion, reflecting their role inxa0disease-perturbed changes in GBM. ELISA assays for a subset of GBMSig (CD44, VCAM1, HMOX1, andxa0BIGH3) on 84 plasma specimens from multiple clinical sites revealed a high degree of separation of GBM patients from healthy control individuals (area under the curve is 0.98 in receiver operating characteristic). In addition, a classifier based on these four proteins differentiated the blood of pre- and post-tumorxa0resections, demonstrating potential clinical value as biomarkers.
international conference on bioinformatics | 2009
Juan Caballero; Gustavo Glusman
Coding and non-coding gene prediction is still a challenge. Diverse computer-based tools have been created to screen sequences using elaborate strategies for gene prediction. Many of these implement various statistical tests to measure the plausibility of the prediction but until now, a comprehensive negative control did not exist. We developed an algorithm that generates sequences with characteristics of the intergenic regions of a genome, including nucleotide composition and typical inserted elements like interspersed repeats, low complexity sequences and pseudogenes. We also challenged some gene prediction programs to compare the artificial sequences with real intergenic regions.