Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Han Fang is active.

Publication


Featured researches published by Han Fang.


Nature Methods | 2014

accurate de novo and transmitted indel detection in exome-capture data using microassembly

Giuseppe Narzisi; Jason O'Rawe; Ivan Iossifov; Han Fang; Yoon-ha Lee; Zihua Wang; Yiyang Wu; Gholson J. Lyon; Michael Wigler; Michael C. Schatz

We present an open-source algorithm, Scalpel (http://scalpel.sourceforge.net/), which combines mapping and assembly for sensitive and specific discovery of insertions and deletions (indels) in exome-capture data. A detailed repeat analysis coupled with a self-tuning k-mer strategy allows Scalpel to outperform other state-of-the-art approaches for indel discovery, particularly in regions containing near-perfect repeats. We analyzed 593 families from the Simons Simplex Collection and demonstrated Scalpels power to detect long (≥30 bp) transmitted events and enrichment for de novo likely gene-disrupting indels in autistic children.


Genome Medicine | 2014

Reducing INDEL calling errors in whole genome and exome sequencing data

Han Fang; Yiyang Wu; Giuseppe Narzisi; Jason O'Rawe; Laura Jimenez Barron; Julie Rosenbaum; Michael Ronemus; Ivan Iossifov; Michael C. Schatz; Gholson J. Lyon

BackgroundINDELs, especially those disrupting protein-coding regions of the genome, have been strongly associated with human diseases. However, there are still many errors with INDEL variant calling, driven by library preparation, sequencing biases, and algorithm artifacts.MethodsWe characterized whole genome sequencing (WGS), whole exome sequencing (WES), and PCR-free sequencing data from the same samples to investigate the sources of INDEL errors. We also developed a classification scheme based on the coverage and composition to rank high and low quality INDEL calls. We performed a large-scale validation experiment on 600 loci, and find high-quality INDELs to have a substantially lower error rate than low-quality INDELs (7% vs. 51%).ResultsSimulation and experimental data show that assembly based callers are significantly more sensitive and robust for detecting large INDELs (>5 bp) than alignment based callers, consistent with published data. The concordance of INDEL detection between WGS and WES is low (53%), and WGS data uniquely identifies 10.8-fold more high-quality INDELs. The validation rate for WGS-specific INDELs is also much higher than that for WES-specific INDELs (84% vs. 57%), and WES misses many large INDELs. In addition, the concordance for INDEL detection between standard WGS and PCR-free sequencing is 71%, and standard WGS data uniquely identifies 6.3-fold more low-quality INDELs. Furthermore, accurate detection with Scalpel of heterozygous INDELs requires 1.2-fold higher coverage than that for homozygous INDELs. Lastly, homopolymer A/T INDELs are a major source of low-quality INDEL calls, and they are highly enriched in the WES data.ConclusionsOverall, we show that accuracy of INDEL detection with WGS is much greater than WES even in the targeted region. We calculated that 60X WGS depth of coverage from the HiSeq platform is needed to recover 95% of INDELs detected by Scalpel. While this is higher than current sequencing practice, the deeper coverage may save total project costs because of the greater accuracy and sensitivity. Finally, we investigate sources of INDEL errors (for example, capture deficiency, PCR amplification, homopolymers) with various data that will serve as a guideline to effectively reduce INDEL errors in genome sequencing.


Bioinformatics | 2017

GenomeScope: fast reference-free genome profiling from short reads

Gregory W. Vurture; Fritz J. Sedlazeck; Maria Nattestad; Charles J. Underwood; Han Fang; James Gurtowski; Michael C. Schatz

Summary: GenomeScope is an open‐source web tool to rapidly estimate the overall characteristics of a genome, including genome size, heterozygosity rate and repeat content from unprocessed short reads. These features are essential for studying genome evolution, and help to choose parameters for downstream analysis. We demonstrate its accuracy on 324 simulated and 16 real datasets with a wide range in genome sizes, heterozygosity levels and error rates. Availability and Implementation: http://genomescope.org, https://github.com/schatzlab/genomescope.git. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


American Journal of Human Genetics | 2015

TAF1 Variants Are Associated with Dysmorphic Features, Intellectual Disability, and Neurological Manifestations.

Jason A. O’Rawe; Yiyang Wu; Max J. Dörfel; Alan F. Rope; P.Y. Billie Au; Jillian S. Parboosingh; Sungjin Moon; Maria Kousi; Konstantina Kosma; Christopher Smith; Maria Tzetis; Jane L. Schuette; Robert B. Hufnagel; Carlos E. Prada; Francisco Venegas Martínez; Carmen Orellana; Jonathan Crain; Alfonso Caro-Llopis; Silvestre Oltra; Sandra Monfort; Laura T. Jiménez-Barrón; Jeffrey Swensen; Sara Ellingwood; Rosemarie Smith; Han Fang; Sandra Ospina; Sander Stegmann; Nicolette S. den Hollander; David Mittelman; Gareth Highnam

We describe an X-linked genetic syndrome associated with mutations in TAF1 and manifesting with global developmental delay, intellectual disability (ID), characteristic facial dysmorphology, generalized hypotonia, and variable neurologic features, all in male individuals. Simultaneous studies using diverse strategies led to the identification of nine families with overlapping clinical presentations and affected by de novo or maternally inherited single-nucleotide changes. Two additional families harboring large duplications involving TAF1 were also found to share phenotypic overlap with the probands harboring single-nucleotide changes, but they also demonstrated a severe neurodegeneration phenotype. Functional analysis with RNA-seq for one of the families suggested that the phenotype is associated with downregulation of a set of genes notably enriched with genes regulated by E-box proteins. In addition, knockdown and mutant studies of this gene in zebrafish have shown a quantifiable, albeit small, effect on a neuronal phenotype. Our results suggest that mutations in TAF1 play a critical role in the development of this X-linked ID syndrome.


Nature Methods | 2018

Accurate detection of complex structural variations using single-molecule sequencing

Fritz J. Sedlazeck; Philipp Rescheneder; Moritz Smolka; Han Fang; Maria Nattestad; Arndt von Haeseler; Michael C. Schatz

Structural variations are the greatest source of genetic variation, but they remain poorly understood because of technological limitations. Single-molecule long-read sequencing has the potential to dramatically advance the field, although high error rates are a challenge with existing methods. Addressing this need, we introduce open-source methods for long-read alignment (NGMLR; https://github.com/philres/ngmlr) and structural variant identification (Sniffles; https://github.com/fritzsedlazeck/Sniffles) that provide unprecedented sensitivity and precision for variant detection, even in repeat-rich regions and for complex nested events that can have substantial effects on human health. In several long-read datasets, including healthy and cancerous human genomes, we discovered thousands of novel variants and categorized systematic errors in short-read approaches. NGMLR and Sniffles can automatically filter false events and operate on low-coverage data, thereby reducing the high costs that have hindered the application of long reads in clinical and research settings.NGMLR and Sniffles perform highly accurate alignment and structural variation detection from long-read sequencing data.


Nature Protocols | 2016

Indel variant analysis of short-read sequencing data with Scalpel

Han Fang; Ewa A. Bergmann; Kanika Arora; Vladimir Vacic; Michael C. Zody; Ivan Iossifov; Jason O'Rawe; Yiyang Wu; Laura Jimenez Barron; Julie Rosenbaum; Michael Ronemus; Yoon-ha Lee; Zihua Wang; Esra Dikoglu; Vaidehi Jobanputra; Gholson J. Lyon; Michael Wigler; Michael C. Schatz; Giuseppe Narzisi

As the second most common type of variation in the human genome, insertions and deletions (indels) have been linked to many diseases, but the discovery of indels of more than a few bases in size from short-read sequencing data remains challenging. Scalpel (http://scalpel.sourceforge.net) is an open-source software for reliable indel detection based on the microassembly technique. It has been successfully used to discover mutations in novel candidate genes for autism, and it is extensively used in other large-scale studies of human diseases. This protocol gives an overview of the algorithm and describes how to use Scalpel to perform highly accurate indel calling from whole-genome and whole-exome sequencing data. We provide detailed instructions for an exemplary family-based de novo study, but we also characterize the other two supported modes of operation: single-sample and somatic analysis. Indel normalization, visualization and annotation of the mutations are also illustrated. Using a standard server, indel discovery and characterization in the exonic regions of the example sequencing data can be completed in ∼5 h after read mapping.


PeerJ | 2013

Integrating precision medicine in the study and clinical treatment of a severely mentally ill person

Jason O'Rawe; Han Fang; Shawn Rynearson; Reid J. Robison; Edward S. Kiruluta; Gerald Higgins; Karen Eilbeck; Martin G. Reese; Gholson J. Lyon

Background. In recent years, there has been an explosion in the number of technical and medical diagnostic platforms being developed. This has greatly improved our ability to more accurately, and more comprehensively, explore and characterize human biological systems on the individual level. Large quantities of biomedical data are now being generated and archived in many separate research and clinical activities, but there exists a paucity of studies that integrate the areas of clinical neuropsychiatry, personal genomics and brain-machine interfaces. Methods. A single person with severe mental illness was implanted with the Medtronic Reclaim® Deep Brain Stimulation (DBS) Therapy device for Obsessive Compulsive Disorder (OCD), targeting his nucleus accumbens/anterior limb of the internal capsule. Programming of the device and psychiatric assessments occurred in an outpatient setting for over two years. His genome was sequenced and variants were detected in the Illumina Whole Genome Sequencing Clinical Laboratory Improvement Amendments (CLIA)-certified laboratory. Results. We report here the detailed phenotypic characterization, clinical-grade whole genome sequencing (WGS), and two-year outcome of a man with severe OCD treated with DBS. Since implantation, this man has reported steady improvement, highlighted by a steady decline in his Yale-Brown Obsessive Compulsive Scale (YBOCS) score from ∼38 to a score of ∼25. A rechargeable Activa RC neurostimulator battery has been of major benefit in terms of facilitating a degree of stability and control over the stimulation. His psychiatric symptoms reliably worsen within hours of the battery becoming depleted, thus providing confirmatory evidence for the efficacy of DBS for OCD in this person. WGS revealed that he is a heterozygote for the p.Val66Met variant in BDNF, encoding a member of the nerve growth factor family, and which has been found to predispose carriers to various psychiatric illnesses. He carries the p.Glu429Ala allele in methylenetetrahydrofolate reductase (MTHFR) and the p.Asp7Asn allele in ChAT, encoding choline O-acetyltransferase, with both alleles having been shown to confer an elevated susceptibility to psychoses. We have found thousands of other variants in his genome, including pharmacogenetic and copy number variants. This information has been archived and offered to this person alongside the clinical sequencing data, so that he and others can re-analyze his genome for years to come. Conclusions. To our knowledge, this is the first study in the clinical neurosciences that integrates detailed neuropsychiatric phenotyping, deep brain stimulation for OCD and clinical-grade WGS with management of genetic results in the medical treatment of one person with severe mental illness. We offer this as an example of precision medicine in neuropsychiatry including brain-implantable devices and genomics-guided preventive health care.


bioRxiv | 2014

Reducing INDEL errors in whole-genome and exome sequencing

Han Fang; Giuseppe Narzisi; Jason O'Rawe; Yiyang Wu; Julie Rosenbaum; Michael Ronemus; Ivan Iossifov; Michael C. Schatz; Gholson J. Lyon

Background INDELs, especially those disrupting protein-coding regions of the genome, have been strongly associated with human diseases. However, there are still many errors with INDEL variant calling, driven by library preparation, sequencing biases, and algorithm artifacts. Methods We characterized whole genome sequencing (WGS), whole exome sequencing (WES), and PCR-free sequencing data from the same samples to investigate the sources of INDEL errors. We also developed a classification scheme based on the coverage and composition to rank high and low quality INDEL calls. We performed a large-scale validation experiment on 600 loci, and find high-quality INDELs to have a substantially lower error rate than low quality INDELs (7% vs. 51%). Results Simulation and experimental data show that assembly based callers are significantly more sensitive and robust for detecting large INDELs (>5 bp) than alignment based callers, consistent with published data. The concordance of INDEL detection between WGS and WES is low (52%), and WGS data uniquely identifies 10.8-fold more high-quality INDELs. The validation rate for WGS-specific INDELs is also much higher than that for WES-specific INDELs (85% vs. 54%), and WES misses many large INDELs. In addition, the concordance for INDEL detection between standard WGS and PCR-free sequencing is 71%, and standard WGS data uniquely identifies 6.3-fold more low-quality INDELs. Furthermore, accurate detection with Scalpel of heterozygous INDELs requires 1.2-fold higher coverage than that for homozygous INDELs. Lastly, homopolymer A/T INDELs are a major source of low-quality INDEL calls, and they are highly enriched in the WES data. Conclusions Overall, we show that accuracy of INDEL detection with WGS is much greater than WES even in the targeted region. We calculated that 60X WGS depth of coverage from the HiSeq platform is needed to recover 95% of INDELs detected by Scalpel. While this is higher than current sequencing practice, the deeper coverage may save total project costs because of the greater accuracy and sensitivity. Finally, we investigate sources of INDEL errors (e.g. capture deficiency, PCR amplification, homopolymers) with various data that will serve as a guideline to effectively reduce INDEL errors in genome sequencing.


Yeast | 2017

Proteomic and genomic characterization of a yeast model for Ogden syndrome

Max J. Dörfel; Han Fang; Jonathan Crain; Michael Klingener; Jake Weiser; Gholson J. Lyon

Naa10 is an Nα‐terminal acetyltransferase that, in a complex with its auxiliary subunit Naa15, co‐translationally acetylates the α‐amino group of newly synthetized proteins as they emerge from the ribosome. Roughly 40–50% of the human proteome is acetylated by Naa10, rendering this an enzyme one of the most broad substrate ranges known. Recently, we reported an X‐linked disorder of infancy, Ogden syndrome, in two families harbouring a c.109 T > C (p.Ser37Pro) variant in NAA10. In the present study we performed in‐depth characterization of a yeast model of Ogden syndrome. Stress tests and proteomic analyses suggest that the S37P mutation disrupts Naa10 function and reduces cellular fitness during heat shock, possibly owing to dysregulation of chaperone expression and accumulation. Microarray and RNA‐seq revealed a pseudo‐diploid gene expression profile in ΔNaa10 cells, probably responsible for a mating defect. In conclusion, the data presented here further support the disruptive nature of the S37P/Ogden mutation and identify affected cellular processes potentially contributing to the severe phenotype seen in Ogden syndrome. Data are available via GEO under identifier GSE86482 or with ProteomeXchange under identifier PXD004923.


bioRxiv | 2015

Genome-wide variant analysis of simplex autism families with an integrative clinical-bioinformatics pipeline

Laura T. Jiménez-Barrón; Jason O'Rawe; Yiyang Wu; Margaret Yoon; Han Fang; Ivan Iossifov; Gholson J. Lyon

Autism spectrum disorders (ASDs) are a group of developmental disabilities that affect social interaction and communication and are characterized by repetitive behaviors. There is now a large body of evidence that suggests a complex role of genetics in ASDs, in which many different loci are involved. Although many current population-scale genomic studies have been demonstrably fruitful, these studies generally focus on analyzing a limited part of the genome or use a limited set of bioinformatics tools. These limitations preclude the analysis of genome-wide perturbations that may contribute to the development and severity of ASD-related phenotypes. To overcome these limitations, we have developed and utilized an integrative clinical and bioinformatics pipeline for generating a more complete and reliable set of genomic variants for downstream analyses. Our study focuses on the analysis of three simplex autism families consisting of one affected child, unaffected parents, and one unaffected sibling. All members were clinically evaluated and widely phenotyped. Genotyping arrays and whole-genome sequencing were performed on each member, and the resulting sequencing data were analyzed using a variety of available bioinformatics tools. We searched for rare variants of putative functional impact that were found to be segregating according to de novo, autosomal recessive, X-linked, mitochondrial, and compound heterozygote transmission models. The resulting candidate variants included three small heterozygous copy-number variations (CNVs), a rare heterozygous de novo nonsense mutation in MYBBP1A located within exon 1, and a novel de novo missense variant in LAMB3. Our work demonstrates how more comprehensive analyses that include rich clinical data and whole-genome sequencing data can generate reliable results for use in downstream investigations.

Collaboration


Dive into the Han Fang's collaboration.

Top Co-Authors

Avatar

Gholson J. Lyon

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar

Jason O'Rawe

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Martin G. Reese

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yiyang Wu

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

David Mittelman

Virginia Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar

Ivan Iossifov

Cold Spring Harbor Laboratory

View shared research outputs
Researchain Logo
Decentralizing Knowledge