Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Naisha Shah is active.

Publication


Featured researches published by Naisha Shah.


Proceedings of the National Academy of Sciences of the United States of America | 2016

Deep sequencing of 10,000 human genomes

Amalio Telenti; Levi C. T. Pierce; William H. Biggs; Julia di Iulio; Emily H. M. Wong; Martin M. Fabani; Ewen F. Kirkness; Ahmed A. Moustafa; Naisha Shah; Chao Xie; Suzanne Brewerton; Nadeem Bulsara; Chad Garner; Gary Metzker; Efren Sandoval; Brad A. Perkins; Franz J. Och; Yaron Turpaz; J. Craig Venter

Significance Large-scale initiatives toward personalized medicine are driving a massive expansion in the number of human genomes being sequenced. Therefore, there is an urgent need to define quality standards for clinical use. This includes deep coverage and sequencing accuracy of an individual’s genome. Our work represents the largest effort to date in sequencing human genomes at deep coverage with these new standards. This study identifies over 150 million human variants, a majority of them rare and unknown. Moreover, these data identify sites in the genome that are highly intolerant to variation—possibly essential for life or health. We conclude that high-coverage genome sequencing provides accurate detail on human variation for discovery and clinical applications. We report on the sequencing of 10,545 human genomes at 30×–40× coverage with an emphasis on quality metrics and novel variant and sequence discovery. We find that 84% of an individual human genome can be sequenced confidently. This high-confidence region includes 91.5% of exon sequence and 95.2% of known pathogenic variant positions. We present the distribution of over 150 million single-nucleotide variants in the coding and noncoding genome. Each newly sequenced genome contributes an average of 8,579 novel variants. In addition, each genome carries on average 0.7 Mb of sequence that is not found in the main build of the hg38 reference genome. The density of this catalog of variation allowed us to construct high-resolution profiles that define genomic sites that are highly intolerant of genetic variation. These results indicate that the data generated by deep genome sequencing is of the quality necessary for clinical use.


Nature Medicine | 2015

Genome-wide identification of microRNAs regulating cholesterol and triglyceride homeostasis

Alexandre Wagschal; S. Hani Najafi-Shoushtari; Lifeng Wang; Leigh Goedeke; Sumita Sinha; Andrew S. deLemos; Josh C. Black; Cristina M. Ramírez; Yingxia Li; Ryan Tewhey; Ida J. Hatoum; Naisha Shah; Yong Lu; Fjoralba Kristo; Nikolaos Psychogios; Vladimir Vrbanac; Yi-Chien Lu; Timothy Hla; Rafael de Cabo; John S. Tsang; Eric E. Schadt; Pardis C. Sabeti; Sekar Kathiresan; David E. Cohen; Johnathan R. Whetstine; Raymond T. Chung; Carlos Fernández-Hernando; Lee M. Kaplan; Andre Bernards; Robert E. Gerszten

Genome-wide association studies (GWASs) have linked genes to various pathological traits. However, the potential contribution of regulatory noncoding RNAs, such as microRNAs (miRNAs), to a genetic predisposition to pathological conditions has remained unclear. We leveraged GWAS meta-analysis data from >188,000 individuals to identify 69 miRNAs in physical proximity to single-nucleotide polymorphisms (SNPs) associated with abnormal levels of circulating lipids. Several of these miRNAs (miR-128-1, miR-148a, miR-130b, and miR-301b) control the expression of key proteins involved in cholesterol-lipoprotein trafficking, such as the low-density lipoprotein (LDL) receptor (LDLR) and the ATP-binding cassette A1 (ABCA1) cholesterol transporter. Consistent with human liver expression data and genetic links to abnormal blood lipid levels, overexpression and antisense targeting of miR-128-1 or miR-148a in high-fat diet–fed C57BL/6J and Apoe-null mice resulted in altered hepatic expression of proteins involved in lipid trafficking and metabolism, and in modulated levels of circulating lipoprotein-cholesterol and triglycerides. Taken together, these findings support the notion that altered expression of miRNAs may contribute to abnormal blood lipid levels, predisposing individuals to human cardiometabolic disorders.


Journal of Clinical Investigation | 2014

Abnormal B cell memory subsets dominate HIV-specific responses in infected individuals

Lela Kardava; Susan Moir; Naisha Shah; Wei Wang; Richard Wilson; Clarisa M. Buckner; Brian H. Santich; Leo Kim; Emily Spurlin; Amy Nelson; Adam K. Wheatley; Christopher J. Harvey; Adrian B. McDermott; Kai W. Wucherpfennig; Tae-Wook Chun; John S. Tsang; Yuxing Li; Anthony S. Fauci

Recently, several neutralizing anti-HIV antibodies have been isolated from memory B cells of HIV-infected individuals. Despite extensive evidence of B cell dysfunction in HIV disease, little is known about the cells from which these rare HIV-specific antibodies originate. Accordingly, we used HIV envelope gp140 and CD4 or coreceptor (CoR) binding site (bs) mutant probes to evaluate HIV-specific responses in peripheral blood B cells of HIV-infected individuals at various stages of infection. In contrast to non-HIV responses, HIV-specific responses against gp140 were enriched within abnormal B cells, namely activated and exhausted memory subsets, which are largely absent in the blood of uninfected individuals. Responses against the CoRbs, which is a poorly neutralizing epitope, arose early, whereas those against the well-characterized neutralizing epitope CD4bs were delayed and infrequent. Enrichment of the HIV-specific response within resting memory B cells, the predominant subset in uninfected individuals, did occur in certain infected individuals who maintained low levels of plasma viremia and immune activation with or without antiretroviral therapy. The distribution of HIV-specific responses among memory B cell subsets was corroborated by transcriptional analyses. Taken together, our findings provide valuable insight into virus-specific B cell responses in HIV infection and demonstrate that memory B cell abnormalities may contribute to the ineffectiveness of the antibody response in infected individuals.


Nature Biotechnology | 2016

A crowdsourcing approach for reusing and meta-analyzing gene expression data

Naisha Shah; Yongjian Guo; Katherine V. Wendelsdorf; Yong Lu; Rachel Sparks; John S. Tsang

803 capabilities for the (meta-) analysis of public gene-expression data sets. It was also designed to serve as a ‘commons’ for engaging diverse segments of the biomedical research community to help build a comprehensive and reusable repository of meta-information (e.g., sample groups, pairs and their annotations)—essential building blocks for constructing data compendia and performing meta-analyses (Fig. 1a). OMiCC can further serve as an educational tool to help students learn new biology by exposing them to hands-on exploration of large-scale data sets. This didactic mission is particularly important as biology is increasingly dominated by ‘big data’-driven approaches. More than 26,000 pre-normalized and quality-checked human and mouse studies comprising ~690,000 expression profiles from the Gene Expression Omnibus (GEO) (Supplementary Note 2) are now accessible through OMiCC. A core feature is the ability to easily create, annotate and share comparison group pairs (CGPs; Fig. 1b). A CGP comprises two collections (called sample groups) of gene expression profiles from a study, for example, blood transcriptomes of diabetic patients and of healthy controls. OMiCC provides easyto-use interfaces for constructing sample groups and CGPs and for annotating them using medical subject headings (MeSH)10, a standardized biomedical vocabulary used by PubMed, so that the resulting annotations are more easily interpretable and reusable by the community. Once a CGP is formed, OMiCC can compute significantly differentially expressed genes and a differential expression profile (DEP) capturing the differences in expression values for all genes between the sample groups (Fig. 1b). In contrast to approaches that use only statistically significant differentially expressed genes for comparison among CGPs4,11, DEPs can be collated across CGPs spanning one or more studies to form a data matrix operable by existing analysis tools and algorithms, including clustering and gene set enrichment To the Editor: Advances in high-throughput technologies have led to a rapid increase in the amount of data generated on a molecular, cellular and organismal scale1,2. The reuse and meta-analysis of large-scale data from multiple independent studies can increase the statistical power to obtain new and robust biological insights, compared with the analysis of any one study, and may serve as a productive starting point for informing the design of experiments3. Previous studies have successfully combined publicly available data from published studies to both reposition drugs4 and identify robust gene-expression signatures of transplant rejection5, infection status6,7, tumor subtypes and cancer progression8. However, these meta-analysis approaches are not trivial, often requiring study-related information that is not always available, as well as computational and statistical expertise that could discourage direct, hands-on participation of many biologists. Here we present OMics Compendia Commons (OMiCC) (https://omicc.niaid.nih. gov), a freely available tool, aimed at biologists with limited bioinformatics training, that uses a crowdsourcing approach to help overcome some of these challenges. OMiCC enables the broader biomedical research community to generate and test hypotheses through reuse and (meta-) analysis of existing data sets. Annotations, metadata and components of cross-study data compendia created by users are stored and made available to other users of the platform so that they may build on previous analyses and contribute their own annotations and analysis designs. In this way, OMiCC may help bring down barriers across communities and encourage a culture of sharing and openness in biomedical research. Millions of gene expression profiles reside in public databases1,2. These data could potentially be used to generate, assess or replicate hypotheses, even if the experiments were not originally designed to answer the same research questions. For example, data for evaluating the effect of a drug (in which drug-treated versus untreated subjects are compared) could be used to investigate the effects of gender on drug treatment. In addition, meta-analysis approaches9 will become increasingly effective for drawing robust conclusions from similar data sets generated from independent studies. However, the wealth of information available in public databases remains largely untapped, particularly by experimental biologists. One reason for this is that the steps involved in retrieving, processing and analyzing these data can be computationally and statistically complex for many biologists. Numerous resources have been created to enable the reuse and analysis of large-scale expression data (Supplementary Note 1), but they are generally limited to one or a subset of analytical steps, and therefore additional programming is still required for most workflows. Although commercial software has been developed to address some of these limitations, the algorithms are often proprietary, which makes incorporating external data into any analysis difficult, if not impossible. Furthermore, fee-based services could limit the size and diversity of the user community; less well-funded groups and research areas, as well as organizations from developing countries, tend to have less access. Another major barrier for both experimental and computational biologists alike is that structured, meta-information critical for data reuse and cross-study analyses is typically not readily available. It is often necessary to determine which samples from a study can be grouped, which groups can be meaningfully compared (e.g., a particular type of tumor samples versus normal), and what groups can be collated or compared within and across studies. Constructing such sample groups and comparison pairs requires biological expertise specific to the biological domain of the study; doing so en masse for all available studies is thus enormously timeconsuming and challenging. OMiCC provides programming-free A crowdsourcing approach for reusing and metaanalyzing gene expression data C O R R E S P O N D E N C E


Proceedings of the National Academy of Sciences of the United States of America | 2018

Precision medicine screening using whole-genome sequencing and advanced imaging to identify disease risk in adults

Bradley A. Perkins; C. Thomas Caskey; Pamila Brar; Eric Dec; David S. Karow; Andrew M. Kahn; Ying-Chen Claire Hou; Naisha Shah; Debbie Boeldt; Erin Coughlin; Gabby Hands; Victor Lavrenko; James Yu; Andrea Procko; Julia Appis; Anders M. Dale; Lining Guo; Thomas J. Jönsson; Bryan M. Wittmann; István Bartha; Smriti Ramakrishnan; Axel Bernal; James B. Brewer; Suzanne Brewerton; William H. Biggs; Yaron Turpaz; J. Craig Venter

Significance Advances in technology are enabling evaluation for prevention and early detection of age-related chronic diseases associated with premature mortality, such as cancer and cardiovascular diseases. These diseases kill about one-third of men and one-quarter of women between the ages of 50 and 74 years old in the United States. We used whole-genome sequencing, advanced imaging, and other clinical testing to screen 209 active, symptom-free adults. We identified a broad set of complementary age-related chronic disease risks associated with premature mortality. Reducing premature mortality associated with age-related chronic diseases, such as cancer and cardiovascular disease, is an urgent priority. We report early results using genomics in combination with advanced imaging and other clinical testing to proactively screen for age-related chronic disease risk among adults. We enrolled active, symptom-free adults in a study of screening for age-related chronic diseases associated with premature mortality. In addition to personal and family medical history and other clinical testing, we obtained whole-genome sequencing (WGS), noncontrast whole-body MRI, dual-energy X-ray absorptiometry (DXA), global metabolomics, a new blood test for prediabetes (Quantose IR), echocardiography (ECHO), ECG, and cardiac rhythm monitoring to identify age-related chronic disease risks. Precision medicine screening using WGS and advanced imaging along with other testing among active, symptom-free adults identified a broad set of complementary age-related chronic disease risks associated with premature mortality and strengthened WGS variant interpretation. This and other similarly designed screening approaches anchored by WGS and advanced imaging may have the potential to extend healthy life among active adults through improved prevention and early detection of age-related chronic diseases (and their risk factors) associated with premature mortality.


American Journal of Human Genetics | 2018

Identification of Misclassified ClinVar Variants via Disease Population Prevalence

Naisha Shah; Ying-Chen Claire Hou; Hung-Chun Yu; Rachana Sainger; C. Thomas Caskey; J. Craig Venter; Amalio Telenti

There is a significant interest in the standardized classification of human genetic variants. We used whole-genome sequence data from 10,495 unrelated individuals to contrast population frequency of pathogenic variants to the expected population prevalence of the disease. Analyses included the ACMG-recommended 59 gene-condition sets for incidental findings and 463 genes associated with 265 OrphaNet conditions. A total of 25,505 variants were used to identify patterns of inflation (i.e., excess genetic risk and misclassification). Inflation increases as the level of evidence supporting the pathogenic nature of the variant decreases. We observed up to 11.5% of genetic disorders with inflation in pathogenic variant sets and up to 92.3% for the variant set with conflicting interpretations. This improved to 7.7% and 57.7%, respectively, after filtering for disease-specific allele frequency. The patterns of inflation were replicated using public data from more than 138,000 genomes. The burden of rare variants was a main contributing factor of the observed inflation, indicating collective misclassified rare variants. We also analyzed the dynamics of re-classification of variant pathogenicity in ClinVar over time, which indicates progressive improvement in variant classification. The study shows that databases include a significant proportion of wrongly ascertained variants; however, it underscores the critical role of ClinVar to contrast claims and foster validation across submitters.


bioRxiv | 2016

The human functional genome defined by genetic diversity

Julia di Iulio; István Bartha; Emily S. W. Wong; Hung-Chun Yu; Michael A. Hicks; Naisha Shah; Victor Lavrenko; Ewen F. Kirkness; Martin M. Fabani; Dongchan Yang; Inkyung Jung; Williams Biggs; Bing Ren; J. Craig Venter; Amalio Telenti

Large scale efforts to sequence whole human genomes provide extensive data on the non-coding portion of the genome. We used variation information from 11,257 human genomes to describe the spectrum of sequence conservation in the population. We established the genome-wide variability for each nucleotide in the context of the surrounding sequence in order to identify departure from expectation at the population level (context-dependent conservation). We characterized the population diversity for functional elements in the genome and identified the coordination of conserved sequences of distal and cis enhancers, chromatin marks, promoters, coding and intronic regions. The most context-dependent conserved regions of the genome are associated with unique functional annotations and a genomic organization that spreads up to one megabase. Importantly, these regions are enriched by over 100-fold of non-coding pathogenic variants. This analysis of human genetic diversity thus provides a detailed view of sequence conservation, functional constraint and genomic organization of the human genome. Specifically, it identifies highly conserved non-coding sequences that are not captured by analysis of interspecies conservation and are greatly enriched in disease variants.


EBioMedicine | 2018

Acetaminophen (Paracetamol) Use Modifies the Sulfation of Sex Hormones

Isaac V. Cohen; Elizabeth T. Cirulli; Matthew W. Mitchell; Thomas J. Jönsson; James Yu; Naisha Shah; Tim D. Spector; Lining Guo; J. Craig Venter; Amalio Telenti

Background Acetaminophen (paracetamol) is one of the most common medications used for management of pain in the world. There is lack of consensus about the mechanism of action, and concern about the possibility of adverse effects on reproductive health. Methods We first established the metabolome profile that characterizes use of acetaminophen, and we subsequently trained and tested a model that identified metabolomic differences across samples from 455 individuals with and without acetaminophen use. We validated the findings in a European ancestry adult twin cohort of 1880 individuals (TwinsUK), and in a study of 1235 individuals of African American and Hispanic ancestry. We used genomics to elucidate the mechanisms targeted by acetaminophen. Findings We identified a distinctive pattern of depletion of sulfated sex hormones with use of acetaminophen across all populations. We used a Mendelian randomization approach to characterize the role of Sulfotransferase Family 2A Member 1 (SULT2A1) as the site of the interaction. Although CYP3A7-CYP3A51P variants also modified levels of some sulfated sex hormones, only acetaminophen use phenocopied the effect of genetic variants of SULT2A1. Overall, acetaminophen use, age, gender and SULT2A1 and CYP3A7-CYP3A51P genetic variants are key determinants of variation in levels of sulfated sex hormones in blood. The effect of taking acetaminophen on sulfated sex hormones was roughly equivalent to the effect of 35 years of aging. Interpretation These findings raise concerns of the impact of acetaminophen use on hormonal homeostasis. In addition, it modifies views on the mechanism of action of acetaminophen in pain management as sulfated sex hormones can function as neurosteroids and modify nociceptive thresholds.


bioRxiv | 2015

Building Genomic Analysis Pipelines in a Hackathon Setting with Bioinformatician Teams: DNA-seq, Epigenomics, Metagenomics and RNA-seq

Ben Busby; Allissa Dillman; Claire L. Simpson; Ian Fingerman; Sijung Yun; David M. Kristensen; Lisa Federer; Naisha Shah; Matthew C. LaFave; Laura Jimenez-Barron; Manusha Pande; Wen Luo; Brendan Miller; Cem Mayden; Dhruva Chandramohan; Kipper Fletez-Brant; Paul W. Bible; Sergej Nowoshilow; Alfred Chan; Eric Jc Galvez; Jeremy F. Chignell; Joseph N. Paulson; Manoj Kandpal; Suhyeon Yoon; Esther Asaki; Abhinav Nellore; Adam Stine; Robert D. Sanders; Jesse Becker; Matt Lesko

We assembled teams of genomics professionals to assess whether we could rapidly develop pipelines to answer biological questions commonly asked by biologists and others new to bioinformatics by facilitating analysis of high-throughput sequencing data. In January 2015, teams were assembled on the National Institutes of Health (NIH) campus to address questions in the DNA-seq, epigenomics, metagenomics and RNA-seq subfields of genomics. The only two rules for this hackathon were that either the data used were housed at the National Center for Biotechnology Information (NCBI) or would be submitted there by a participant in the next six months, and that all software going into the pipeline was open-source or open-use. Questions proposed by organizers, as well as suggested tools and approaches, were distributed to participants a few days before the event and were refined during the event. Pipelines were published on GitHub, a web service providing publicly available, free-usage tiers for collaborative software development (https://github.com/features/). The code was published at https://github.com/DCGenomics/ with separate repositories for each team, starting with hackathon_v001.


bioRxiv | 2018

Profound perturbation of the human metabolome by obesity

Elizabeth T. Cirulli; Lining Guo; Christine Leon Swisher; Naisha Shah; Lei Huang; Lori A. Napier; Ewen F. Kirkness; Tim D. Spector; C. Thomas Caskey; Bernard Thorens; J. Craig Venter; Amalio Telenti

Obesity is a heterogeneous phenotype that is crudely measured by body mass index (BMI). More precise phenotyping and categorization of risk in large numbers of people with obesity is needed to advance clinical care and drug development. Here, we used non-targeted metabolome analysis and whole genome sequencing to identify metabolic and genetic signatures of obesity. We collected anthropomorphic and metabolic measurements at three timepoints over a median of 13 years in 1,969 adult twins of European ancestry and at a single timepoint in 427 unrelated volunteers. We observe that obesity results in a profound perturbation of the metabolome; nearly a third of the assayed metabolites are associated with changes in BMI. A metabolome signature identifies the healthy obese and also identifies lean individuals with abnormal metabolomes – these groups differ in health outcomes and underlying genetic risk. Because metabolome profiling identifies clinically meaningful heterogeneity in obesity, this approach could help select patients for clinical trials.

Collaboration


Dive into the Naisha Shah's collaboration.

Top Co-Authors

Avatar

J. Craig Venter

J. Craig Venter Institute

View shared research outputs
Top Co-Authors

Avatar

Amalio Telenti

J. Craig Venter Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

C. Thomas Caskey

Baylor College of Medicine

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hung-Chun Yu

University of Colorado Denver

View shared research outputs
Top Co-Authors

Avatar

John S. Tsang

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Michael A. Hicks

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Yong Lu

National Institutes of Health

View shared research outputs
Researchain Logo
Decentralizing Knowledge