Beifang Niu
Washington University in St. Louis
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Beifang Niu.
Nature | 2013
Cyriac Kandoth; Michael D. McLellan; Fabio Vandin; Kai Ye; Beifang Niu; Charles Lu; Mingchao Xie; Qunyuan Zhang; Joshua F. McMichael; Matthew A. Wyczalkowski; Mark D. M. Leiserson; Christopher A. Miller; John S. Welch; Matthew J. Walter; Michael C. Wendl; Timothy J. Ley; Richard Wilson; Benjamin J. Raphael; Li Ding
The Cancer Genome Atlas (TCGA) has used the latest sequencing and analysis methods to identify somatic variants across thousands of tumours. Here we present data and analytical results for point mutations and small insertions/deletions from 3,281 tumours across 12 tumour types as part of the TCGA Pan-Cancer effort. We illustrate the distributions of mutation frequencies, types and contexts across tumour types, and establish their links to tissues of origin, environmental/carcinogen influences, and DNA repair defects. Using the integrated data sets, we identified 127 significantly mutated genes from well-known (for example, mitogen-activated protein kinase, phosphatidylinositol-3-OH kinase, Wnt/β-catenin and receptor tyrosine kinase signalling pathways, and cell cycle control) and emerging (for example, histone, histone modification, splicing, metabolism and proteolysis) cellular processes in cancer. The average number of mutations in these significantly mutated genes varies across tumour types; most tumours have two to six, indicating that the number of driver mutations required during oncogenesis is relatively small. Mutations in transcriptional factors/regulators show tissue specificity, whereas histone modifiers are often mutated across several cancer types. Clinical association analysis identifies genes having a significant effect on survival, and investigations of mutations with respect to clonal/subclonal architecture delineate their temporal orders during tumorigenesis. Taken together, these results lay the groundwork for developing new diagnostics and individualizing cancer treatment.
Bioinformatics | 2012
Limin Fu; Beifang Niu; Zhengwei Zhu; Sitao Wu; Weizhong Li
Summary: CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to ∼24 cores and a quasi-linear speedup for up to ∼8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions. Availability: http://cd-hit.org. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
Bioinformatics | 2010
Ying Huang; Beifang Niu; Ying Gao; Limin Fu; Weizhong Li
Summary: CD-HIT is a widely used program for clustering and comparing large biological sequence datasets. In order to further assist the CD-HIT users, we significantly improved this program with more functions and better accuracy, scalability and flexibility. Most importantly, we developed a new web server, CD-HIT Suite, for clustering a user-uploaded sequence dataset or comparing it to another dataset at different identity levels. Users can now interactively explore the clusters within web browsers. We also provide downloadable clusters for several public databases (NCBI NR, Swissprot and PDB) at different identity levels. Availability: Free access at http://cd-hit.org Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
BMC Genomics | 2011
Sitao Wu; Zhengwei Zhu; Liming Fu; Beifang Niu; Weizhong Li
BackgroundThe new field of metagenomics studies microorganism communities by culture-independent sequencing. With the advances in next-generation sequencing techniques, researchers are facing tremendous challenges in metagenomic data analysis due to huge quantity and high complexity of sequence data. Analyzing large datasets is extremely time-consuming; also metagenomic annotation involves a wide range of computational tools, which are difficult to be installed and maintained by common users. The tools provided by the few available web servers are also limited and have various constraints such as login requirement, long waiting time, inability to configure pipelines etc.ResultsWe developed WebMGA, a customizable web server for fast metagenomic analysis. WebMGA includes over 20 commonly used tools such as ORF calling, sequence clustering, quality control of raw reads, removal of sequencing artifacts and contaminations, taxonomic analysis, functional annotation etc. WebMGA provides users with rapid metagenomic data analysis using fast and effective tools, which have been implemented to run in parallel on our local computer cluster. Users can access WebMGA through web browsers or programming scripts to perform individual analysis or to configure and run customized pipelines. WebMGA is freely available at http://weizhongli-lab.org/metagenomic-analysis.ConclusionsWebMGA offers to researchers many fast and unique tools and great flexibility for complex metagenomic data analysis.
Nature Genetics | 2015
Mark D. M. Leiserson; Fabio Vandin; Hsin-Ta Wu; Jason R. Dobson; Jonathan V Eldridge; Jacob L Thomas; Alexandra Papoutsaki; Younhun Kim; Beifang Niu; Michael D. McLellan; Michael S. Lawrence; Abel Gonzalez-Perez; David Tamborero; Yuwei Cheng; Gregory A Ryslik; Nuria Lopez-Bigas; Gad Getz; Li Ding; Benjamin J. Raphael
Cancers exhibit extensive mutational heterogeneity, and the resulting long-tail phenomenon complicates the discovery of genes and pathways that are significantly mutated in cancer. We perform a pan-cancer analysis of mutated networks in 3,281 samples from 12 cancer types from The Cancer Genome Atlas (TCGA) using HotNet2, a new algorithm to find mutated subnetworks that overcomes the limitations of existing single-gene, pathway and network approaches. We identify 16 significantly mutated subnetworks that comprise well-known cancer signaling pathways as well as subnetworks with less characterized roles in cancer, including cohesin, condensin and others. Many of these subnetworks exhibit co-occurring mutations across samples. These subnetworks contain dozens of genes with rare somatic mutations across multiple cancers; many of these genes have additional evidence supporting a role in cancer. By illuminating these rare combinations of mutations, pan-cancer network analyses provide a roadmap to investigate new diagnostic and therapeutic opportunities across cancer types.
Leukemia | 2013
Matthew J. Walter; Dong Shen; Jin Shao; Li Ding; Brian S. White; Cyriac Kandoth; Christopher A. Miller; Beifang Niu; McLellan; Nathan D. Dees; Robert S. Fulton; K Elliot; Simon Heath; Marcus Grillot; Peter Westervelt; Daniel C. Link; John F. DiPersio; Elaine R. Mardis; Timothy J. Ley; Richard Wilson; Timothy A. Graubert
Recent studies suggest that most cases of myelodysplastic syndrome (MDS) are clonally heterogeneous, with a founding clone and multiple subclones. It is not known whether specific gene mutations typically occur in founding clones or subclones. We screened a panel of 94 candidate genes in a cohort of 157 patients with MDS or secondary acute myeloid leukemia (sAML). This included 150 cases with samples obtained at MDS diagnosis and 15 cases with samples obtained at sAML transformation (8 were also analyzed at the MDS stage). We performed whole-genome sequencing (WGS) to define the clonal architecture in eight sAML genomes and identified the range of variant allele frequencies (VAFs) for founding clone mutations. At least one mutation or cytogenetic abnormality was detected in 83% of the 150 MDS patients and 17 genes were significantly mutated (false discovery rate ⩽0.05). Individual genes and patient samples displayed a wide range of VAFs for recurrently mutated genes, indicating that no single gene is exclusively mutated in the founding clone. The VAFs of recurrently mutated genes did not fully recapitulate the clonal architecture defined by WGS, suggesting that comprehensive sequencing may be required to accurately assess the clonal status of recurrently mutated genes in MDS.
Briefings in Bioinformatics | 2012
Weizhong Li; Limin Fu; Beifang Niu; Sitao Wu; John C. Wooley
The rapid advances of high-throughput sequencing technologies dramatically prompted metagenomic studies of microbial communities that exist at various environments. Fundamental questions in metagenomics include the identities, composition and dynamics of microbial populations and their functions and interactions. However, the massive quantity and the comprehensive complexity of these sequence data pose tremendous challenges in data analysis. These challenges include but are not limited to ever-increasing computational demand, biased sequence sampling, sequence errors, sequence artifacts and novel sequences. Sequence clustering methods can directly answer many of the fundamental questions by grouping similar sequences into families. In addition, clustering analysis also addresses the challenges in metagenomics. Thus, a large redundant data set can be represented with a small non-redundant set, where each cluster can be represented by a single entry or a consensus. Artifacts can be rapidly detected through clustering. Errors can be identified, filtered or corrected by using consensus from sequences within clusters.
Nature Communications | 2015
Charles Lu; Mingchao Xie; Michael C. Wendl; Jiayin Wang; Michael D. McLellan; Mark D. M. Leiserson; Kuan-lin Huang; Matthew A. Wyczalkowski; Reyka Jayasinghe; Tapahsama Banerjee; Jie Ning; Piyush Tripathi; Qunyuan Zhang; Beifang Niu; Kai Ye; Heather K. Schmidt; Robert S. Fulton; Joshua F. McMichael; Prag Batra; Cyriac Kandoth; Maheetha Bharadwaj; Daniel C. Koboldt; Christopher A. Miller; Krishna L. Kanchi; James M. Eldred; David E. Larson; John S. Welch; Ming You; Bradley A. Ozenberger; Ramaswamy Govindan
Large-scale cancer sequencing data enable discovery of rare germline cancer susceptibility variants. Here we systematically analyse 4,034 cases from The Cancer Genome Atlas cancer cases representing 12 cancer types. We find that the frequency of rare germline truncations in 114 cancer-susceptibility-associated genes varies widely, from 4% (acute myeloid leukaemia (AML)) to 19% (ovarian cancer), with a notably high frequency of 11% in stomach cancer. Burden testing identifies 13 cancer genes with significant enrichment of rare truncations, some associated with specific cancers (for example, RAD51C, PALB2 and MSH6 in AML, stomach and endometrial cancers, respectively). Significant, tumour-specific loss of heterozygosity occurs in nine genes (ATM, BAP1, BRCA1/2, BRIP1, FANCM, PALB2 and RAD51C/D). Moreover, our homology-directed repair assay of 68 BRCA1 rare missense variants supports the utility of allelic enrichment analysis for characterizing variants of unknown significance. The scale of this analysis and the somatic-germline integration enable the detection of rare variants that may affect individual susceptibility to tumour development, a critical step toward precision medicine.
Bioinformatics | 2014
Beifang Niu; Kai Ye; Qunyuan Zhang; Charles Lu; Mingchao Xie; Michael D. McLellan; Michael C. Wendl; Li Ding
MOTIVATION Microsatellite instability (MSI) is an important indicator of larger genome instability and has been linked to many genetic diseases, including Lynch syndrome. MSI status is also an independent prognostic factor for favorable survival in multiple cancer types, such as colorectal and endometrial. It also informs the choice of chemotherapeutic agents. However, the current PCR-electrophoresis-based detection procedure is laborious and time-consuming, often requiring visual inspection to categorize samples. We developed MSIsensor, a C++ program for automatically detecting somatic microsatellite changes. It computes length distributions of microsatellites per site in paired tumor and normal sequence data, subsequently using these to statistically compare observed distributions in both samples. Comprehensive testing indicates MSIsensor is an efficient and effective tool for deriving MSI status from standard tumor-normal paired sequence data. AVAILABILITY AND IMPLEMENTATION https://github.com/ding-lab/msisensor
Nature Genetics | 2016
Beifang Niu; Adam Scott; Sohini Sengupta; Matthew Bailey; Prag Batra; Jie Ning; Matthew A. Wyczalkowski; Wen-Wei Liang; Qunyuan Zhang; Michael D. McLellan; Sam Q. Sun; Piyush Tripathi; Carolyn Lou; Kai Ye; R. Jay Mashl; John W. Wallis; Michael C. Wendl; Feng Chen; Li Ding
Local concentrations of mutations are well known in human cancers. However, their three-dimensional spatial relationships in the encoded protein have yet to be systematically explored. We developed a computational tool, HotSpot3D, to identify such spatial hotspots (clusters) and to interpret the potential function of variants within them. We applied HotSpot3D to >4,400 TCGA tumors across 19 cancer types, discovering >6,000 intra- and intermolecular clusters, some of which showed tumor and/or tissue specificity. In addition, we identified 369 rare mutations in genes including TP53, PTEN, VHL, EGFR, and FBXW7 and 99 medium-recurrence mutations in genes such as RUNX1, MTOR, CA3, PI3, and PTPN11, all mapping within clusters having potential functional implications. As a proof of concept, we validated our predictions in EGFR using high-throughput phosphorylation data and cell-line-based experimental evaluation. Finally, mutation–drug cluster and network analysis predicted over 800 promising candidates for druggable mutations, raising new possibilities for designing personalized treatments for patients carrying specific mutations.