Kai Wang

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kai Wang is active.

Explore More

Publication

Featured researches published by Kai Wang.

Nature Genetics | 2010

Andre Franke; Dermot McGovern; Jeffrey C. Barrett; Kai Wang; Graham L. Radford-Smith; Tariq Ahmad; Charlie W. Lees; Tobias Balschun; James C. Lee; Rebecca L. Roberts; Carl A. Anderson; Joshua C. Bis; Suzanne Bumpstead; David Ellinghaus; Eleonora M. Festen; Michel Georges; Todd Green; Talin Haritunians; Luke Jostins; Anna Latiano; Christopher G. Mathew; Grant W. Montgomery; Natalie J. Prescott; Soumya Raychaudhuri; Jerome I. Rotter; Philip Schumm; Yashoda Sharma; Lisa A. Simms; Kent D. Taylor; David C. Whiteman

We undertook a meta-analysis of six Crohns disease genome-wide association studies (GWAS) comprising 6,333 affected individuals (cases) and 15,056 controls and followed up the top association signals in 15,694 cases, 14,026 controls and 414 parent-offspring trios. We identified 30 new susceptibility loci meeting genome-wide significance (P < 5 × 10−8). A series of in silico analyses highlighted particular genes within these loci and, together with manual curation, implicated functionally interesting candidate genes including SMAD3, ERAP2, IL10, IL2RA, TYK2, FUT2, DNMT3A, DENND1B, BACH2 and TAGAP. Combined with previously confirmed loci, these results identify 71 distinct loci with genome-wide significant evidence for association with Crohns disease.

Nature | 2009

Joseph T. Glessner; Kai Wang; Guiqing Cai; Olena Korvatska; Cecilia E. Kim; Shawn Wood; Haitao Zhang; Annette Estes; Camille W. Brune; Jonathan P. Bradfield; Marcin Imielinski; Edward C. Frackelton; Jennifer Reichert; Emily L. Crawford; Jeffrey Munson; Patrick Sleiman; Rosetta M. Chiavacci; Kiran Annaiah; Kelly Thomas; Cuiping Hou; Wendy Glaberson; James H. Flory; Frederick G. Otieno; Maria Garris; Latha Soorya; Lambertus Klei; Joseph Piven; Kacie J. Meyer; Evdokia Anagnostou; Takeshi Sakurai

Autism spectrum disorders (ASDs) are childhood neurodevelopmental disorders with complex genetic origins. Previous studies focusing on candidate genes or genomic regions have identified several copy number variations (CNVs) that are associated with an increased risk of ASDs. Here we present the results from a whole-genome CNV study on a cohort of 859 ASD cases and 1,409 healthy children of European ancestry who were genotyped with ∼550,000 single nucleotide polymorphism markers, in an attempt to comprehensively identify CNVs conferring susceptibility to ASDs. Positive findings were evaluated in an independent cohort of 1,336 ASD cases and 1,110 controls of European ancestry. Besides previously reported ASD candidate genes, such as NRXN1 (ref. 10) and CNTN4 (refs 11, 12), several new susceptibility genes encoding neuronal cell-adhesion molecules, including NLGN1 and ASTN2, were enriched with CNVs in ASD cases compared to controls (P = 9.5 × 10-3). Furthermore, CNVs within or surrounding genes involved in the ubiquitin pathways, including UBE3A, PARK2, RFWD2 and FBXO40, were affected by CNVs not observed in controls (P = 3.3 × 10-3). We also identified duplications 55 kilobases upstream of complementary DNA AK123120 (P = 3.6 × 10-6). Although these variants may be individually rare, they target genes involved in neuronal cell-adhesion or ubiquitin degradation, indicating that these two important gene networks expressed within the central nervous system may contribute to the genetic susceptibility of ASD.

Nature | 2009

Kai Wang; Haitao Zhang; Deqiong Ma; Maja Bucan; Joseph T. Glessner; Brett S. Abrahams; Daria Salyakina; Marcin Imielinski; Jonathan P. Bradfield; Patrick Sleiman; Cecilia E. Kim; Cuiping Hou; Edward C. Frackelton; Rosetta M. Chiavacci; Nagahide Takahashi; Takeshi Sakurai; Eric Rappaport; Clara M. Lajonchere; Jeffrey Munson; Annette Estes; Olena Korvatska; Joseph Piven; Lisa I. Sonnenblick; Ana I. Alvarez Retuerto; Edward I. Herman; Hongmei Dong; Ted Hutman; Marian Sigman; Sally Ozonoff; Ami Klin

Autism spectrum disorders (ASDs) represent a group of childhood neurodevelopmental and neuropsychiatric disorders characterized by deficits in verbal communication, impairment of social interaction, and restricted and repetitive patterns of interests and behaviour. To identify common genetic risk factors underlying ASDs, here we present the results of genome-wide association studies on a cohort of 780 families (3,101 subjects) with affected children, and a second cohort of 1,204 affected subjects and 6,491 control subjects, all of whom were of European ancestry. Six single nucleotide polymorphisms between cadherin 10 (CDH10) and cadherin 9 (CDH9)—two genes encoding neuronal cell-adhesion molecules—revealed strong association signals, with the most significant SNP being rs4307059 (P = 3.4 × 10-8, odds ratio = 1.19). These signals were replicated in two independent cohorts, with combined P values ranging from 7.4 × 10-8 to 2.1 × 10-10. Our results implicate neuronal cell-adhesion molecules in the pathogenesis of ASDs, and represent, to our knowledge, the first demonstration of genome-wide significant association of common variants with susceptibility to ASDs.

Nature | 2008

Mattias Jakobsson; Sonja W. Scholz; Paul Scheet; J. Raphael Gibbs; Jenna M. VanLiere; Hon Chung Fung; Zachary A. Szpiech; James H. Degnan; Kai Wang; Rita Guerreiro; Jose Bras; Jennifer C. Schymick; Dena Hernandez; Bryan J. Traynor; Javier Simón-Sánchez; Mar Matarin; Angela Britton; Joyce van de Leemput; Ian Rafferty; Maja Bucan; Howard M. Cann; John Hardy; Noah A. Rosenberg; Andrew Singleton

Genome-wide patterns of variation across individuals provide a powerful source of data for uncovering the history of migration, range expansion, and adaptation of the human species. However, high-resolution surveys of variation in genotype, haplotype and copy number have generally focused on a small number of population groups. Here we report the analysis of high-quality genotypes at 525,910 single-nucleotide polymorphisms (SNPs) and 396 copy-number-variable loci in a worldwide sample of 29 populations. Analysis of SNP genotypes yields strongly supported fine-scale inferences about population structure. Increasing linkage disequilibrium is observed with increasing geographic distance from Africa, as expected under a serial founder effect for the out-of-Africa spread of human populations. New approaches for haplotype analysis produce inferences about population structure that complement results based on unphased SNPs. Despite a difference from SNPs in the frequency spectrum of the copy-number variants (CNVs) detected—including a comparatively large number of CNVs in previously unexamined populations from Oceania and the Americas—the global distribution of CNVs largely accords with population structure analyses for SNP data sets of similar size. Our results produce new inferences about inter-population variation, support the utility of CNVs in human population-genetic research, and serve as a genomic resource for human-genetic studies in diverse worldwide populations.

Nature Genetics | 2009

Marcin Imielinski; Robert N. Baldassano; Anne M. Griffiths; Richard K. Russell; Vito Annese; Marla Dubinsky; Subra Kugathasan; Jonathan P. Bradfield; Thomas D. Walters; Patrick Sleiman; Cecilia E. Kim; Aleixo M. Muise; Kai Wang; Joseph T. Glessner; Shehzad A. Saeed; Haitao Zhang; Edward C. Frackelton; Cuiping Hou; James H. Flory; George Otieno; Rosetta M. Chiavacci; Robert W. Grundmeier; M. Castro; Anna Latiano; Bruno Dallapiccola; Joanne M. Stempak; Debra J. Abrams; Kent D. Taylor; Dermot McGovern; Melvin B. Heyman

The inflammatory bowel diseases (IBD) Crohns disease and ulcerative colitis are common causes of morbidity in children and young adults in the western world. Here we report the results of a genome-wide association study in early-onset IBD involving 3,426 affected individuals and 11,963 genetically matched controls recruited through international collaborations in Europe and North America, thereby extending the results from a previous study of 1,011 individuals with early-onset IBD. We have identified five new regions associated with early-onset IBD susceptibility, including 16p11 near the cytokine gene IL27 (rs8049439, P = 2.41 × 10−9), 22q12 (rs2412973, P = 1.55 × 10−9), 10q22 (rs1250550, P = 5.63 × 10−9), 2q37 (rs4676410, P = 3.64 × 10−8) and 19q13.11 (rs10500264, P = 4.26 × 10−10). Our scan also detected associations at 23 of 32 loci previously implicated in adult-onset Crohns disease and at 8 of 17 loci implicated in adult-onset ulcerative colitis, highlighting the close pathogenetic relationship between early- and adult-onset IBD.

PLOS Genetics | 2009

Maja Bucan; Brett S. Abrahams; Kai Wang; Joseph T. Glessner; Edward I. Herman; Lisa I. Sonnenblick; Ana I. Alvarez Retuerto; Marcin Imielinski; Dexter Hadley; Jonathan P. Bradfield; Cecilia Kim; Nicole Gidaya; Ingrid Lindquist; Ted Hutman; Marian Sigman; Vlad Kustanovich; Clara M. Lajonchere; Andrew Singleton; Junhyong Kim; Thomas H. Wassink; William M. McMahon; Thomas Owley; John A. Sweeney; Hilary Coon; John I. Nurnberger; Mingyao Li; Rita M. Cantor; Nancy J. Minshew; James S. Sutcliffe; Edwin H. Cook

The genetics underlying the autism spectrum disorders (ASDs) is complex and remains poorly understood. Previous work has demonstrated an important role for structural variation in a subset of cases, but has lacked the resolution necessary to move beyond detection of large regions of potential interest to identification of individual genes. To pinpoint genes likely to contribute to ASD etiology, we performed high density genotyping in 912 multiplex families from the Autism Genetics Resource Exchange (AGRE) collection and contrasted results to those obtained for 1,488 healthy controls. Through prioritization of exonic deletions (eDels), exonic duplications (eDups), and whole gene duplication events (gDups), we identified more than 150 loci harboring rare variants in multiple unrelated probands, but no controls. Importantly, 27 of these were confirmed on examination of an independent replication cohort comprised of 859 cases and an additional 1,051 controls. Rare variants at known loci, including exonic deletions at NRXN1 and whole gene duplications encompassing UBE3A and several other genes in the 15q11–q13 region, were observed in the course of these analyses. Strong support was likewise observed for previously unreported genes such as BZRAP1, an adaptor molecule known to regulate synaptic transmission, with eDels or eDups observed in twelve unrelated cases but no controls (p = 2.3×10−5). Less is known about MDGA2, likewise observed to be case-specific (p = 1.3×10−4). But, it is notable that the encoded protein shows an unexpectedly high similarity to Contactin 4 (BLAST E-value = 3×10−39), which has also been linked to disease. That hundreds of distinct rare variants were each seen only once further highlights complexity in the ASDs and points to the continued need for larger cohorts.

Genome Medicine | 2013

Jason O'Rawe; Tao Jiang; Guangqing Sun; Yiyang Wu; Wei Min Wang; Jingchu Hu; Paul Bodily; Lifeng Tian; Hakon Hakonarson; W. Evan Johnson; Zhi Wei; Kai Wang; Gholson J. Lyon

BackgroundTo facilitate the clinical implementation of genomic medicine by next-generation sequencing, it will be critically important to obtain accurate and consistent variant calls on personal genomes. Multiple software tools for variant calling are available, but it is unclear how comparable these tools are or what their relative merits in real-world scenarios might be.MethodsWe sequenced 15 exomes from four families using commercial kits (Illumina HiSeq 2000 platform and Agilent SureSelect version 2 capture kit), with approximately 120X mean coverage. We analyzed the raw data using near-default parameters with five different alignment and variant-calling pipelines (SOAP, BWA-GATK, BWA-SNVer, GNUMAP, and BWA-SAMtools). We additionally sequenced a single whole genome using the sequencing and analysis pipeline from Complete Genomics (CG), with 95% of the exome region being covered by 20 or more reads per base. Finally, we validated 919 single-nucleotide variations (SNVs) and 841 insertions and deletions (indels), including similar fractions of GATK-only, SOAP-only, and shared calls, on the MiSeq platform by amplicon sequencing with approximately 5000X mean coverage.ResultsSNV concordance between five Illumina pipelines across all 15 exomes was 57.4%, while 0.5 to 5.1% of variants were called as unique to each pipeline. Indel concordance was only 26.8% between three indel-calling pipelines, even after left-normalizing and intervalizing genomic coordinates by 20 base pairs. There were 11% of CG variants falling within targeted regions in exome sequencing that were not called by any of the Illumina-based exome analysis pipelines. Based on targeted amplicon sequencing on the MiSeq platform, 97.1%, 60.2%, and 99.1% of the GATK-only, SOAP-only and shared SNVs could be validated, but only 54.0%, 44.6%, and 78.1% of the GATK-only, SOAP-only and shared indels could be validated. Additionally, our analysis of two families (one with four individuals and the other with seven), demonstrated additional accuracy gained in variant discovery by having access to genetic data from a multi-generational family.ConclusionsOur results suggest that more caution should be exercised in genomic medicine settings when analyzing individual genomes, including interpreting positive and negative findings with scrutiny, especially for indels. We advocate for renewed collection and sequencing of multi-generational families to increase the overall accuracy of whole genomes.

Nature Protocols | 2006

Adam A. Margolin; Kai Wang; Wei Keat Lim; Manjunath Kustagi; Ilya Nemenman

We describe a computational protocol for the ARACNE algorithm, an information-theoretic method for identifying transcriptional interactions between gene products using microarray expression profile data. Similar to other algorithms, ARACNE predicts potential functional associations among genes, or novel functions for uncharacterized genes, by identifying statistical dependencies between gene products. However, based on biochemical validation, literature searches and DNA binding site enrichment analysis, ARACNE has also proven effective in identifying bona fide transcriptional targets, even in complex mammalian networks. Thus we envision that predictions made by ARACNE, especially when supplemented with prior knowledge or additional data sources, can provide appropriate hypotheses for the further investigation of cellular networks. While the examples in this protocol use only gene expression profile data, the algorithms theoretical basis readily extends to a variety of other high-throughput measurements, such as pathway-specific or genome-wide proteomics, microRNA and metabolomics data. As these data become readily available, we expect that ARACNE might prove increasingly useful in elucidating the underlying interaction models. For a microarray data set containing ∼10,000 probes, reconstructing the network around a single probe completes in several minutes using a desktop computer with a Pentium 4 processor. Reconstructing a genome-wide network generally requires a computational cluster, especially if the recommended bootstrapping procedure is used.

Human Molecular Genetics | 2015

Chengliang Dong; Peng Wei; Xueqiu Jian; Richard A. Gibbs; Eric Boerwinkle; Kai Wang; Xiaoming Liu

Accurate deleteriousness prediction for nonsynonymous variants is crucial for distinguishing pathogenic mutations from background polymorphisms in whole exome sequencing (WES) studies. Although many deleteriousness prediction methods have been developed, their prediction results are sometimes inconsistent with each other and their relative merits are still unclear in practical applications. To address these issues, we comprehensively evaluated the predictive performance of 18 current deleteriousness-scoring methods, including 11 function prediction scores (PolyPhen-2, SIFT, MutationTaster, Mutation Assessor, FATHMM, LRT, PANTHER, PhD-SNP, SNAP, SNPs&GO and MutPred), 3 conservation scores (GERP++, SiPhy and PhyloP) and 4 ensemble scores (CADD, PON-P, KGGSeq and CONDEL). We found that FATHMM and KGGSeq had the highest discriminative power among independent scores and ensemble scores, respectively. Moreover, to ensure unbiased performance evaluation of these prediction scores, we manually collected three distinct testing datasets, on which no current prediction scores were tuned. In addition, we developed two new ensemble scores that integrate nine independent scores and allele frequency. Our scores achieved the highest discriminative power compared with all the deleteriousness prediction scores tested and showed low false-positive prediction rate for benign yet rare nonsynonymous variants, which demonstrated the value of combining information from multiple orthologous approaches. Finally, to facilitate variant prioritization in WES studies, we have pre-computed our ensemble scores for 87 347 044 possible variants in the whole-exome and made them publicly available through the ANNOVAR software and the dbNSFP database.

Nature | 2011

Anjali G. Hinch; Arti Tandon; Nick Patterson; Yunli Song; Nadin Rohland; C. Palmer; Gary K. Chen; Kai Wang; Sarah G. Buxbaum; Ermeg L. Akylbekova; Melinda C. Aldrich; Christine B. Ambrosone; Christopher I. Amos; Elisa V. Bandera; Sonja I. Berndt; Leslie Bernstein; William J. Blot; Cathryn H. Bock; Eric Boerwinkle; Qiuyin Cai; Neil E. Caporaso; Graham Casey; L. Adrienne Cupples; Sandra L. Deming; W. Ryan Diver; Jasmin Divers; Myriam Fornage; Elizabeth M. Gillanders; Joseph T. Glessner; Curtis C. Harris

Recombination, together with mutation, gives rise to genetic variation in populations. Here we leverage the recent mixture of people of African and European ancestry in the Americas to build a genetic map measuring the probability of crossing over at each position in the genome, based on about 2.1 million crossovers in 30,000 unrelated African Americans. At intervals of more than three megabases it is nearly identical to a map built in Europeans. At finer scales it differs significantly, and we identify about 2,500 recombination hotspots that are active in people of West African ancestry but nearly inactive in Europeans. The probability of a crossover at these hotspots is almost fully controlled by the alleles an individual carries at PRDM9 (P value < 10−245). We identify a 17-base-pair DNA sequence motif that is enriched in these hotspots, and is an excellent match to the predicted binding target of PRDM9 alleles common in West Africans and rare in Europeans. Sites of this motif are predicted to be risk loci for disease-causing genomic rearrangements in individuals carrying these alleles. More generally, this map provides a resource for research in human genetic variation and evolution.

Explore More