Saurabh Baheti | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Saurabh Baheti is active.

Explore More

Publication

Featured researches published by Saurabh Baheti.

American Journal of Human Genetics | 2016

REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants

Nilah M. Ioannidis; Joseph H. Rothstein; Vikas Pejaver; Sumit Middha; Shannon K. McDonnell; Saurabh Baheti; Anthony M. Musolf; Qing Li; Emily Rose Holzinger; Danielle M. Karyadi; Lisa A. Cannon-Albright; Craig Teerlink; Janet L. Stanford; William B. Isaacs; Jianfeng F. Xu; Kathleen A. Cooney; Ethan M. Lange; Johanna Schleutker; John D. Carpten; Isaac J. Powell; Olivier Cussenot; Geraldine Cancel-Tassin; Graham G. Giles; Robert J. MacInnis; Christiane Maier; Chih-Lin Hsieh; Fredrik Wiklund; William J. Catalona; William D. Foulkes; Diptasri Mandal

The vast majority of coding variants are rare, and assessment of the contribution of rare variants to complex traits is hampered by low statistical power and limited functional data. Improved methods for predicting the pathogenicity of rare coding variants are needed to facilitate the discovery of disease variants from exome sequencing studies. We developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons. REVEL was trained with recently discovered pathogenic and rare neutral missense variants, excluding those previously used to train its constituent tools. When applied to two independent test sets, REVEL had the best overall performance (p < 10-12) as compared to any individual tool and seven ensemble methods: MetaSVM, MetaLR, KGGSeq, Condel, CADD, DANN, and Eigen. Importantly, REVEL also had the best performance for distinguishing pathogenic from rare neutral variants with allele frequencies <0.5%. The area under the receiver operating characteristic curve (AUC) for REVEL was 0.046-0.182 higher in an independent test set of 935 recent SwissVar disease variants and 123,935 putatively neutral exome sequencing variants and 0.027-0.143 higher in an independent test set of 1,953 pathogenic and 2,406 benign variants recently reported in ClinVar than the AUCs for other ensemble methods. We provide pre-computed REVEL scores for all possible human missense variants to facilitate the identification of pathogenic variants in the sea of rare variants discovered as sequencing studies expand in scale.

American Journal of Human Genetics | 2016

Mutations in GANAB, Encoding the Glucosidase IIα Subunit, Cause Autosomal-Dominant Polycystic Kidney and Liver Disease.

Binu Porath; Vladimir G. Gainullin; Emilie Cornec-Le Gall; Elizabeth K. Dillinger; Christina M. Heyer; Katharina Hopp; Marie E. Edwards; Charles D. Madsen; Sarah R. Mauritz; Carly J. Banks; Saurabh Baheti; Bharathi Reddy; José Ignacio Herrero; Jesus M. Banales; Marie C. Hogan; Velibor Tasic; Terry Watnick; Arlene B. Chapman; Cécile Vigneau; Frédéric Lavainne; Marie Pierre Audrezet; Claude Férec; Yannick Le Meur; Vicente E. Torres; Peter C. Harris

Autosomal-dominant polycystic kidney disease (ADPKD) is a common, progressive, adult-onset disease that is an important cause of end-stage renal disease (ESRD), which requires transplantation or dialysis. Mutations in PKD1 or PKD2 (∼85% and ∼15% of resolved cases, respectively) are the known causes of ADPKD. Extrarenal manifestations include an increased level of intracranial aneurysms and polycystic liver disease (PLD), which can be severe and associated with significant morbidity. Autosomal-dominant PLD (ADPLD) with no or very few renal cysts is a separate disorder caused by PRKCSH, SEC63, or LRP5 mutations. After screening, 7%-10% of ADPKD-affected and ∼50% of ADPLD-affected families were genetically unresolved (GUR), suggesting further genetic heterogeneity of both disorders. Whole-exome sequencing of six GUR ADPKD-affected families identified one with a missense mutation in GANAB, encoding glucosidase II subunit α (GIIα). Because PRKCSH encodes GIIβ, GANAB is a strong ADPKD and ADPLD candidate gene. Sanger screening of 321 additional GUR families identified eight further likely mutations (six truncating), and a total of 20 affected individuals were identified in seven ADPKD- and two ADPLD-affected families. The phenotype was mild PKD and variable, including severe, PLD. Analysis of GANAB-null cells showed an absolute requirement of GIIα for maturation and surface and ciliary localization of the ADPKD proteins (PC1 and PC2), and reduced mature PC1 was seen in GANAB(+/-) cells. PC1 surface localization in GANAB(-/-) cells was rescued by wild-type, but not mutant, GIIα. Overall, we show that GANAB mutations cause ADPKD and ADPLD and that the cystogenesis is most likely driven by defects in PC1 maturation.

BMC Bioinformatics | 2014

MAP-RSeq: Mayo Analysis Pipeline for RNA sequencing

Krishna R. Kalari; Asha Nair; Jaysheel D. Bhavsar; Daniel O’Brien; Jaime Davila; Matthew A Bockol; Jinfu Nie; Xiaojia Tang; Saurabh Baheti; Jay B Doughty; Sumit Middha; Hugues Sicotte; Aubrey E. Thompson; Yan W. Asmann; Jean-Pierre A. Kocher

BackgroundAlthough the costs of next generation sequencing technology have decreased over the past years, there is still a lack of simple-to-use applications, for a comprehensive analysis of RNA sequencing data. There is no one-stop shop for transcriptomic genomics. We have developed MAP-RSeq, a comprehensive computational workflow that can be used for obtaining genomic features from transcriptomic sequencing data, for any genome.ResultsFor optimization of tools and parameters, MAP-RSeq was validated using both simulated and real datasets. MAP-RSeq workflow consists of six major modules such as alignment of reads, quality assessment of reads, gene expression assessment and exon read counting, identification of expressed single nucleotide variants (SNVs), detection of fusion transcripts, summarization of transcriptomics data and final report. This workflow is available for Human transcriptome analysis and can be easily adapted and used for other genomes. Several clinical and research projects at the Mayo Clinic have applied the MAP-RSeq workflow for RNA-Seq studies. The results from MAP-RSeq have thus far enabled clinicians and researchers to understand the transcriptomic landscape of diseases for better diagnosis and treatment of patients.ConclusionsOur software provides gene counts, exon counts, fusion candidates, expressed single nucleotide variants, mapping statistics, visualizations, and a detailed research data report for RNA-Seq. The workflow can be executed on a standalone virtual machine or on a parallel Sun Grid Engine cluster. The software can be downloaded from http://bioinformaticstools.mayo.edu/research/maprseq/.

Bioinformatics | 2012

TREAT: a bioinformatics tool for variant annotations and visualizations in targeted and exome sequencing data

Yan W. Asmann; Sumit Middha; Asif Hossain; Saurabh Baheti; Ying Li; High-seng Chai; Zhifu Sun; Patrick H. Duffy; Ahmed A. Hadad; Asha Nair; Xiaoyu Liu; Yuji Zhang; Eric W. Klee; Krishna R. Kalari; Jean-Pierre A. Kocher

Summary: TREAT (Targeted RE-sequencing Annotation Tool) is a tool for facile navigation and mining of the variants from both targeted resequencing and whole exome sequencing. It provides a rich integration of publicly available as well as in-house developed annotations and visualizations for variants, variant-hosting genes and host-gene pathways. Availability and implementation: TREAT is freely available to non-commercial users as either a stand-alone annotation and visualization tool, or as a comprehensive workflow integrating sequencing alignment and variant calling. The executables, instructions and the Amazon Cloud Images of TREAT can be downloaded at the website: http://ndc.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm Contact: [email protected]; [email protected] Supplementary information: Supplementary data are provided at Bioinformatics online.

Frontiers in Oncology | 2012

Deep Sequence Analysis of Non-Small Cell Lung Cancer: Integrated Analysis of Gene Expression, Alternative Splicing, and Single Nucleotide Variations in Lung Adenocarcinomas with and without Oncogenic KRAS Mutations

Krishna R. Kalari; David Rossell; Brian M. Necela; Yan W. Asmann; Asha Nair; Saurabh Baheti; Jennifer M. Kachergus; Curtis S. Younkin; Tiffany R. Baker; Jennifer M. Carr; Xiaojia Tang; Michael P. Walsh; High Seng Chai; Zhifu Sun; Steven N. Hart; Alexey A. Leontovich; Asif Hossain; Jean Pierre A Kocher; Edith A. Perez; David Reisman; Alan P. Fields; E. Aubrey Thompson

KRAS mutations are highly prevalent in non-small cell lung cancer (NSCLC), and tumors harboring these mutations tend to be aggressive and resistant to chemotherapy. We used next-generation sequencing technology to identify pathways that are specifically altered in lung tumors harboring a KRAS mutation. Paired-end RNA-sequencing of 15 primary lung adenocarcinoma tumors (8 harboring mutant KRAS and 7 with wild-type KRAS) were performed. Sequences were mapped to the human genome, and genomic features, including differentially expressed genes, alternate splicing isoforms and single nucleotide variants, were determined for tumors with and without KRAS mutation using a variety of computational methods. Network analysis was carried out on genes showing differential expression (374 genes), alternate splicing (259 genes), and SNV-related changes (65 genes) in NSCLC tumors harboring a KRAS mutation. Genes exhibiting two or more connections from the lung adenocarcinoma network were used to carry out integrated pathway analysis. The most significant signaling pathways identified through this analysis were the NFκB, ERK1/2, and AKT pathways. A 27 gene mutant KRAS-specific sub network was extracted based on gene–gene connections from the integrated network, and interrogated for druggable targets. Our results confirm previous evidence that mutant KRAS tumors exhibit activated NFκB, ERK1/2, and AKT pathways and may be preferentially sensitive to target therapeutics toward these pathways. In addition, our analysis indicates novel, previously unappreciated links between mutant KRAS and the TNFR and PPARγ signaling pathways, suggesting that targeted PPARγ antagonists and TNFR inhibitors may be useful therapeutic strategies for treatment of mutant KRAS lung tumors. Our study is the first to integrate genomic features from RNA-Seq data from NSCLC and to define a first draft genomic landscape model that is unique to tumors with oncogenic KRAS mutations.

Clinical Cancer Research | 2015

New DNA Methylation Markers for Pancreatic Cancer: Discovery, Tissue Validation, and Pilot Testing in Pancreatic Juice

John B. Kisiel; Massimo Raimondo; William R. Taylor; Tracy C. Yab; Douglas W. Mahoney; Zhifu Sun; Sumit Middha; Saurabh Baheti; Hongzhi Zou; Thomas C. Smyrk; Lisa A. Boardman; Gloria M. Petersen; David A. Ahlquist

Purpose: Discriminant markers for pancreatic cancer detection are needed. We sought to identify and validate methylated DNA markers for pancreatic cancer using next-generation sequencing unbiased by known targets. Experimental Design: At a referral center, we conducted four sequential case–control studies: discovery, technical validation, biologic validation, and clinical piloting. Candidate markers were identified using variance-inflated logistic regression on reduced-representation bisulfite DNA sequencing results from matched pancreatic cancers, benign pancreas, and normal colon tissues. Markers were validated technically on replicate discovery study DNA and biologically on independent, matched, blinded tissues by methylation-specific PCR. Clinical testing of six methylation candidates and mutant KRAS was performed on secretin-stimulated pancreatic juice samples from 61 patients with pancreatic cancer, 22 with chronic pancreatitis, and 19 with normal pancreas on endoscopic ultrasound. Areas under receiver-operating characteristics curves (AUC) for markers were calculated. Results: Sequencing identified >500 differentially hyper-methylated regions. On independent tissues, AUC on 19 selected markers ranged between 0.73 and 0.97. Pancreatic juice AUC values for CD1D, KCNK12, CLEC11A, NDRG4, IKZF1, PKRCB, and KRAS were 0.92*, 0.88, 0.85, 0.85, 0.84, 0.83, and 0.75, respectively, for pancreatic cancer compared with normal pancreas and 0.92*, 0.73, 0.76, 0.85*, 0.73, 0.77, and 0.62 for pancreatic cancer compared with chronic pancreatitis (*, P = 0.001 vs. KRAS). Conclusions: We identified and validated novel DNA methylation markers strongly associated with pancreatic cancer. On pilot testing in pancreatic juice, best markers (especially CD1D) highly discriminated pancreatic cases from controls. Clin Cancer Res; 21(19); 4473–81. ©2015 AACR.

Bioinformatics | 2012

SAAP-RRBS

Zhifu Sun; Saurabh Baheti; Sumit Middha; Rahul Kanwar; Yuji Zhang; Xing Li; Andreas S. Beutler; Eric W. Klee; Yan W. Asmann; E. Aubrey Thompson; Jean-Pierre A. Kocher

Summary: Reduced representation bisulfite sequencing (RRBS) is a cost-effective approach for genome-wide methylation pattern profiling. Analyzing RRBS sequencing data is challenging and specialized alignment/mapping programs are needed. Although such programs have been developed, a comprehensive solution that provides researchers with good quality and analyzable data is still lacking. To address this need, we have developed a Streamlined Analysis and Annotation Pipeline for RRBS data (SAAP-RRBS) that integrates read quality assessment/clean-up, alignment, methylation data extraction, annotation, reporting and visualization. This package facilitates a rapid transition from sequencing reads to a fully annotated CpG methylation report to biological interpretation. Availability and implementation: SAAP-RRBS is freely available to non-commercial users at the web site http://ndc.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm. Contact: [email protected] or [email protected] Supplementary Information: Supplementary data are available at Bioinformatics online.

PLOS ONE | 2013

SoftSearch: integration of multiple sequence features to identify breakpoints of structural variations.

Steven N. Hart; Vivekananda Sarangi; Raymond Moore; Saurabh Baheti; Jaysheel D. Bhavsar; Fergus J. Couch; Jean Pierre A Kocher

Background Structural variation (SV) represents a significant, yet poorly understood contribution to an individual’s genetic makeup. Advanced next-generation sequencing technologies are widely used to discover such variations, but there is no single detection tool that is considered a community standard. In an attempt to fulfil this need, we developed an algorithm, SoftSearch, for discovering structural variant breakpoints in Illumina paired-end next-generation sequencing data. SoftSearch combines multiple strategies for detecting SV including split-read, discordant read-pair, and unmated pairs. Co-localized split-reads and discordant read pairs are used to refine the breakpoints. Results We developed and validated SoftSearch using real and synthetic datasets. SoftSearch’s key features are 1) not requiring secondary (or exhaustive primary) alignment, 2) portability into established sequencing workflows, and 3) is applicable to any DNA-sequencing experiment (e.g. whole genome, exome, custom capture, etc.). SoftSearch identifies breakpoints from a small number of soft-clipped bases from split reads and a few discordant read-pairs which on their own would not be sufficient to make an SV call. Conclusions We show that SoftSearch can identify more true SVs by combining multiple sequence features. SoftSearch was able to call clinically relevant SVs in the BRCA2 gene not reported by other tools while offering significantly improved overall performance.

Epigenetics | 2014

Aberrant signature methylome by DNMT1 hot spot mutation in hereditary sensory and autonomic neuropathy 1E.

Zhifu Sun; Yanhong Wu; Tamas Ordog; Saurabh Baheti; Jinfu Nie; Xiaohui Duan; Kaori Hojo; Jean Pierre A Kocher; Peter James Dyck; Christopher J. Klein

DNA methyltransferase 1 (DNMT1) is essential for DNA methylation, gene regulation and chromatin stability. We previously discovered DNMT1 mutations cause hereditary sensory and autonomic neuropathy type 1 with dementia and hearing loss (HSAN1E; OMIM 614116). HSAN1E is the first adult-onset neurodegenerative disorder caused by a defect in a methyltransferase gene. HSAN1E patients appear clinically normal until young adulthood, then begin developing the characteristic symptoms involving central and peripheral nervous systems. Some HSAN1E patients also develop narcolepsy and it has recently been suggested that HSAN1E is allelic to autosomal dominant cerebellar ataxia, deafness, with narcolepsy (ADCA-DN; OMIM 604121), which is also caused by mutations in DNMT1. A hotspot mutation Y495C within the targeting sequence domain of DNMT1 has been identified among HSAN1E patients. The mutant DNMT1 protein shows premature degradation and reduced DNA methyltransferase activity. Herein, we investigate genome-wide DNA methylation at single-base resolution through whole-genome bisulfite sequencing of germline DNA in 3 pairs of HSAN1E patients and their gender- and age-matched siblings. Over 1 billion 75-bp single-end reads were generated for each sample. In the 3 affected siblings, overall methylation loss was consistently found in all chromosomes with X and 18 being most affected. Paired sample analysis identified 564,218 differentially methylated CpG sites (DMCs; P < 0.05), of which 300 134 were intergenic and 264 084 genic CpGs. Hypomethylation was predominant in both genic and intergenic regions, including promoters, exons, most CpG islands, L1, L2, Alu, and satellite repeats and simple repeat sequences. In some CpG islands, hypermethylated CpGs outnumbered hypomethylated CpGs. In 201 imprinted genes, there were more DMCs than in non-imprinted genes and most were hypomethylated. Differentially methylated region (DMR) analysis identified 5649 hypomethylated and 1872 hypermethylated regions. Importantly, pathway analysis revealed 1693 genes associated with the identified DMRs were highly associated in diverse neurological disorders and NAD+/NADH metabolism pathways is implicated in the pathogenesis. Our results provide novel insights into the epigenetic mechanism of neurodegeneration arising from a hotspot DNMT1 mutation and reveal pathways potentially important in a broad category of neurological and psychological disorders.

Neurology | 2016

Target-enrichment sequencing and copy number evaluation in inherited polyneuropathy

Wei Wang; Chen Wang; D. Brian Dawson; Erik C. Thorland; Patrick A. Lundquist; Bruce W. Eckloff; Yanhong Wu; Saurabh Baheti; Jared M. Evans; Steven S. Scherer; Peter James Dyck; Christopher J. Klein

Objective: To assess the efficiency of target-enrichment next-generation sequencing (NGS) with copy number assessment in inherited neuropathy diagnosis. Methods: A 197 polyneuropathy gene panel was designed to assess for mutations in 93 patients with inherited or idiopathic neuropathy without known genetic cause. We applied our novel copy number variation algorithm on NGS data, and validated the identified copy number mutations using CytoScan (Affymetrix). Cost and efficacy of this targeted NGS approach was compared to earlier evaluations. Results: Average coverage depth was ∼760× (median = 600, 99.4% > 100×). Among 93 patients, 18 mutations were identified in 17 cases (18%), including 3 copy number mutations: 2 PMP22 duplications and 1 MPZ duplication. The 2 patients with PMP22 duplication presented with bulbar and respiratory involvement and had absent extremity nerve conductions, leading to axonal diagnosis. Average onset age of these 17 patients was 25 years (2–61 years), vs 45 years for those without genetic discovery. Among those with onset age less than 40 years, the diagnostic yield of targeted NGS approach is high (27%) and cost savings is significant (∼20%). However, the cost savings for patients with late onset age and without family history is not demonstrated. Conclusions: Incorporating copy number analysis in target-enrichment NGS approach improved the efficiency of mutation discovery for chronic, inherited, progressive length-dependent polyneuropathy diagnosis. The new technology is facilitating a simplified genetic diagnostic algorithm utilizing targeted NGS, clinical phenotypes, age at onset, and family history to improve diagnosis efficiency. Our findings prompt a need for updating the current practice parameters and payer guidelines.

Explore More