Benilton Carvalho
State University of Campinas
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Benilton Carvalho.
Nature Methods | 2015
Wolfgang Huber; Vincent J. Carey; Robert Gentleman; Simon Anders; Marc Carlson; Benilton Carvalho; Héctor Corrada Bravo; Sean Davis; Laurent Gatto; Thomas Girke; Raphael Gottardo; Florian Hahne; Kasper D. Hansen; Rafael A. Irizarry; Michael S. Lawrence; Michael I. Love; James W. MacDonald; Valerie Obenchain; Andrzej K. Oleś; Hervé Pagès; Alejandro Reyes; Paul Shannon; Gordon K. Smyth; Dan Tenenbaum; Levi Waldron; Martin Morgan
Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors.
Bioinformatics | 2010
Benilton Carvalho; Rafael A. Irizarry
MOTIVATION The availability of flexible open source software for the analysis of gene expression raw level data has greatly facilitated the development of widely used preprocessing methods for these technologies. However, the expansion of microarray applications has exposed the limitation of existing tools. RESULTS We developed the oligo package to provide a more general solution that supports a wide range of applications. The package is based on the BioConductor principles of transparency, reproducibility and efficiency of development. It extends the existing tools and leverages existing code for visualization, accessing data and widely used preprocessing routines. The oligo package implements a unified paradigm for preprocessing data and interfaces with other BioConductor tools for downstream analysis. Our infrastructure is general and can be used by other BioConductor packages. AVAILABILITY The oligo package is freely available through BioConductor, http://www.bioconductor.org.
Nature Genetics | 2014
Jamie M.J. Weaver; Caryn S. Ross-Innes; Nicholas Shannon; Andy G. Lynch; Tim Forshew; Mariagnese Barbera; Muhammed Murtaza; Chin-Ann J. Ong; Pierre Lao-Sirieix; Mark J. Dunning; Laura Smith; M.L.R. Smith; Charlotte Anderson; Benilton Carvalho; Maria O'Donovan; Timothy J. Underwood; Andrew May; Nicola Grehan; Richard H. Hardwick; Jim Davies; Arusha Oloumi; Sam Aparicio; Carlos Caldas; Matthew Eldridge; Paul A.W. Edwards; Nitzan Rosenfeld; Simon Tavaré; Rebecca C. Fitzgerald
Cancer genome sequencing studies have identified numerous driver genes, but the relative timing of mutations in carcinogenesis remains unclear. The gradual progression from premalignant Barretts esophagus to esophageal adenocarcinoma (EAC) provides an ideal model to study the ordering of somatic mutations. We identified recurrently mutated genes and assessed clonal structure using whole-genome sequencing and amplicon resequencing of 112 EACs. We next screened a cohort of 109 biopsies from 2 key transition points in the development of malignancy: benign metaplastic never-dysplastic Barretts esophagus (NDBE; n = 66) and high-grade dysplasia (HGD; n = 43). Unexpectedly, the majority of recurrently mutated genes in EAC were also mutated in NDBE. Only TP53 and SMAD4 mutations occurred in a stage-specific manner, confined to HGD and EAC, respectively. Finally, we applied this knowledge to identify high-risk Barretts esophagus in a new non-endoscopic test. In conclusion, mutations in EAC driver genes generally occur exceptionally early in disease development with profound implications for diagnostic and therapeutic strategies.
Bioinformatics | 2008
Henrik Bengtsson; Rafael A. Irizarry; Benilton Carvalho; Terence P. Speed
MOTIVATION Although copy-number aberrations are known to contribute to the diversity of the human DNA and cause various diseases, many aberrations and their phenotypes are still to be explored. The recent development of single-nucleotide polymorphism (SNP) arrays provides researchers with tools for calling genotypes and identifying chromosomal aberrations at an order-of-magnitude greater resolution than possible a few years ago. The fundamental problem in array-based copy-number (CN) analysis is to obtain CN estimates at a single-locus resolution with high accuracy and precision such that downstream segmentation methods are more likely to succeed. RESULTS We propose a preprocessing method for estimating raw CNs from Affymetrix SNP arrays. Its core utilizes a multichip probe-level model analogous to that for high-density oligonucleotide expression arrays. We extend this model by adding an adjustment for sequence-specific allelic imbalances such as cross-hybridization between allele A and allele B probes. We focus on total CN estimates, which allows us to further constrain the probe-level model to increase the signal-to-noise ratio of CN estimates. Further improvement is obtained by controlling for PCR effects. Each part of the model is fitted robustly. The performance is assessed by quantifying how well raw CNs alone differentiate between one and two copies on Chromosome X (ChrX) at a single-locus resolution (27kb) up to a 200kb resolution. The evaluation is done with publicly available HapMap data. AVAILABILITY The proposed method is available as part of an open-source R package named aroma.affymetrix. Because it is a bounded-memory algorithm, any number of arrays can be analyzed.
Arteriosclerosis, Thrombosis, and Vascular Biology | 2007
Veronica Fernandes; Joseph F. Polak; Susan Cheng; Boaz D. Rosen; Benilton Carvalho; Khurram Nasir; Robyn L. McClelland; Gregory Hundley; Greg Pearson; Daniel H. O'Leary; David A. Bluemke; Joao A.C. Lima
Objective—The pathophysiology of left ventricular (LV) dysfunction, particularly in the setting of a preserved ejection fraction (EF), remains unclear. Few studies have investigated the relationship between arterial compliance and LV function in humans, and none used cardiovascular MRI. Methods and Results—We sought to determine whether arterial compliance is related to regional myocardial function among participants of the Multi-Ethnic Study of Atherosclerosis (MESA). Arterial compliance was assessed using carotid ultrasound measurements to calculate the distensibility coefficient (DC) and Young’s modulus (YM). Circumferential systolic (SRS) and diastolic (SRE) strain rates were calculated by harmonic phase (HARP) from tagged MRI. Associations between arterial compliance and indices of ventricular function were adjusted for cardiovascular risk factors. We found a significant association between arterial compliance and SRS in all myocardial regions (P<0.05); arterial compliance was also associated with SRE in the lateral and septal wall regions (P<0.05). Multiple linear regression analyses demonstrated a direct linear relationship between the carotid artery DC and SRS across all LV segments and slices, even after adjustment for cardiovascular risk factors and LV mass. In regression analyses, a significant relationship between arterial compliance and SRE in the septal and antero-apical walls was also found and remained significant after multivariable adjustment. Conclusion—Arterial stiffness is associated with early and asymptomatic impairment of systolic as well as diastolic myocardial function. Further studies are needed to elucidate role of vascular compliance in the development of ventricular dysfunction and failure.
Journal of the American College of Cardiology | 2014
Anders Opdahl; Bharath Ambale Venkatesh; Veronica Rolim S. Fernandes; Colin O. Wu; Khurram Nasir; Eui-Young Choi; Andre L.C. Almeida; Boaz D. Rosen; Benilton Carvalho; Thor Edvardsen; David A. Bluemke; Joao A.C. Lima
OBJECTIVES The objective of this study was to investigate the relationship between baseline resting heart rate and incidence of heart failure (HF) and global and regional left ventricular (LV) dysfunction. BACKGROUND The association of resting heart rate to HF and LV function has not been well described in an asymptomatic multi-ethnic population. METHODS Resting heart rate was measured in participants in the MESA (Multi-Ethnic Study of Atherosclerosis) trial at inclusion. Incident HF was registered (n = 176) during follow-up (median 7 years) in those who underwent cardiac magnetic resonance imaging (n = 5,000). Changes in ejection fraction (ΔEF) and peak circumferential strain (Δεcc) were measured as markers of developing global and regional LV dysfunction in 1,056 participants imaged at baseline and 5 years later. Time to HF (Cox model) and Δεcc and ΔEF (multiple linear regression models) were adjusted for demographics, traditional cardiovascular risk factors, calcium score, LV end-diastolic volume, and mass in addition to resting heart rate. RESULTS Cox analysis demonstrated that for 1 beat/min increase in resting heart rate, there was a 4% greater adjusted relative risk for incident HF (hazard ratio: 1.04; 95% CI: 1.02 to 1.06; p < 0.001). Adjusted multiple regression models demonstrated that resting heart rate was positively associated with deteriorating εcc and decrease in EF, even when all coronary heart disease events were excluded from the model. CONCLUSIONS Elevated resting heart rate was associated with increased risk for incident HF in asymptomatic participants in the MESA trial. Higher heart rate was related to development of regional and global LV dysfunction independent of subclinical atherosclerosis and coronary heart disease. (Multi-Ethnic Study of Atherosclerosis [MESA]; NCT00005487).
Genes, Chromosomes and Cancer | 2008
Marianne Tuefferd; An De Bondt; Ilse Van den Wyngaert; Willem Talloen; Tobias Verbeke; Benilton Carvalho; Djork-Arné Clevert; Marco Alifano; Nandini Raghavan; Dhammika Amaratunga; Hinrich Göhlmann; Philippe Broët; Sophie Camilleri-Broët
SNP arrays offer the opportunity to get a genome‐wide view on copy number alterations and are increasingly used in oncology. DNA from formalin‐fixed paraffin‐embedded material (FFPE) is partially degraded which limits the application of those technologies for retrospective studies. We present the use of Affymetrix GeneChip SNP6.0 for identification of copy number alterations in fresh frozen (FF) and matched FFPE samples. Fifteen pairs of adenocarcinomas with both frozen and FFPE embedded material were analyzed. We present an optimization of the sample preparation and show the importance of correcting the measured intensities for fragment length and GC‐content when using FFPE samples. The absence of GC content correction results in a chromosome specific “wave pattern” which may lead to the misclassification of genomic regions as being altered. The highest concordance between FFPE and matched FF were found in samples with the highest call rates. Nineteen of the 23 high level amplifications (83%) seen using FF samples were also detected in the corresponding FFPE material. For limiting the rate of “false positive” alterations, we have chosen a conservative False Discovery Rate (FDR). We observed better results using SNP probes than CNV probes for copy number analysis of FFPE material. This is the first report on the detection of copy number alterations in FFPE samples using Affymetrix GeneChip SNP6.0.
Bioinformatics | 2010
Benilton Carvalho; Thomas A. Louis; Rafael A. Irizarry
MOTIVATION Genome-wide association studies (GWAS) are used to discover genes underlying complex, heritable disorders for which less powerful study designs have failed in the past. The number of GWAS has skyrocketed recently with findings reported in top journals and the mainstream media. Microarrays are the genotype calling technology of choice in GWAS as they permit exploration of more than a million single nucleotide polymorphisms (SNPs) simultaneously. The starting point for the statistical analyses used by GWAS to determine association between loci and disease is making genotype calls (AA, AB or BB). However, the raw data, microarray probe intensities, are heavily processed before arriving at these calls. Various sophisticated statistical procedures have been proposed for transforming raw data into genotype calls. We find that variability in microarray output quality across different SNPs, different arrays and different sample batches have substantial influence on the accuracy of genotype calls made by existing algorithms. Failure to account for these sources of variability can adversely affect the quality of findings reported by the GWAS. RESULTS We developed a method based on an enhanced version of the multi-level model used by CRLMM version 1. Two key differences are that we now account for variability across batches and improve the call-specific assessment of each call. The new model permits the development of quality metrics for SNPs, samples and batches of samples. Using three independent datasets, we demonstrate that the CRLMM version 2 outperforms CRLMM version 1 and the algorithm provided by Affymetrix, Birdseed. The main advantage of the new approach is that it enables the identification of low-quality SNPs, samples and batches. AVAILABILITY Software implementing of the method described in this article is available as free and open source code in the crlmm R/BioConductor package. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Biostatistics | 2011
Robert B. Scharpf; Ingo Ruczinski; Benilton Carvalho; Betty Doan; Aravinda Chakravarti; Rafael A. Irizarry
Submicroscopic changes in chromosomal DNA copy number dosage are common and have been implicated in many heritable diseases and cancers. Recent high-throughput technologies have a resolution that permits the detection of segmental changes in DNA copy number that span thousands of base pairs in the genome. Genomewide association studies (GWAS) may simultaneously screen for copy number phenotype and single nucleotide polymorphism (SNP) phenotype associations as part of the analytic strategy. However, genomewide array analyses are particularly susceptible to batch effects as the logistics of preparing DNA and processing thousands of arrays often involves multiple laboratories and technicians, or changes over calendar time to the reagents and laboratory equipment. Failure to adjust for batch effects can lead to incorrect inference and requires inefficient post hoc quality control procedures to exclude regions that are associated with batch. Our work extends previous model-based approaches for copy number estimation by explicitly modeling batch and using shrinkage to improve locus-specific estimates of copy number uncertainty. Key features of this approach include the use of biallelic genotype calls from experimental data to estimate batch-specific and locus-specific parameters of background and signal without the requirement of training data. We illustrate these ideas using a study of bipolar disease and a study of chromosome 21 trisomy. The former has batch effects that dominate much of the observed variation in the quantile-normalized intensities, while the latter illustrates the robustness of our approach to a data set in which approximately 27% of the samples have altered copy number. Locus-specific estimates of copy number can be plotted on the copy number scale to investigate mosaicism and guide the choice of appropriate downstream approaches for smoothing the copy number as a function of physical position. The software is open source and implemented in the R package crlmm at Bioconductor (http:www.bioconductor.org).
Bioinformatics | 2009
Matthew E. Ritchie; Benilton Carvalho; Kurt N. Hetrick; Simon Tavaré; Rafael A. Irizarry
Summary: Illumina produces a number of microarray-based technologies for human genotyping. An Infinium BeadChip is a two-color platform that types between 105 and 106 single nucleotide polymorphisms (SNPs) per sample. Despite being widely used, there is a shortage of open source software to process the raw intensities from this platform into genotype calls. To this end, we have developed the R/Bioconductor package crlmm for analyzing BeadChip data. After careful preprocessing, our software applies the CRLMM algorithm to produce genotype calls, confidence scores and other quality metrics at both the SNP and sample levels. We provide access to the raw summary-level intensity data, allowing users to develop their own methods for genotype calling or copy number analysis if they wish. Availability and Implementation: The crlmm Bioconductor package is available from http://www.bioconductor.org. Data packages and documentation are available from http://rafalab.jhsph.edu/software.html. Contact: [email protected]; [email protected]