Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mingyao Li is active.

Publication


Featured researches published by Mingyao Li.


Nucleic Acids Research | 2010

ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data

Kai Wang; Mingyao Li; Hakon Hakonarson

High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a ‘variants reduction’ protocol on 4.7u2009million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires ∼4u2009min to perform gene-based annotation and ∼15u2009min to perform variants reduction on 4.7u2009million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/.


American Journal of Human Genetics | 2005

Strong association of the Y402H variant in complement factor H at 1q32 with susceptibility to age-related macular degeneration

Sepideh Zareparsi; Kari Branham; Mingyao Li; Sapna Shah; Robert J. Klein; Jurg Ott; Josephine Hoh; Gonçalo R. Abecasis; Anand Swaroop

Using a large sample of cases and controls from a single center, we show that a T-->C substitution in exon 9 (Y402H) of the complement factor H gene is strongly associated with susceptibility to age-related macular degeneration, the most common cause of blindness in the elderly. Frequency of the C allele was 0.61 in cases, versus 0.34 in age-matched controls (P<1x10(-24)). Genotype frequencies also differ markedly between cases and controls (chi2=112.68 [2 degrees of freedom]; P<1x10(-24)). A multiplicative model fits the data well, and we estimate the population frequency of the high-risk C allele to be 0.39 (95% confidence interval 0.36-0.42) and the genotype relative risk to be 2.44 (95% confidence interval 2.08-2.83) for TC heterozygotes and 5.93 (95% confidence interval 4.33-8.02) for CC homozygotes.


Science | 2011

Widespread RNA and DNA Sequence Differences in the Human Transcriptome

Mingyao Li; Isabel X. Wang; Yun Li; Alan Bruzel; Allison L. Richards; Jonathan M. Toung; Vivian G. Cheung

All 12 categories of discordances can be observed where the RNA sequence does not match that of the DNA. The transmission of information from DNA to RNA is a critical process. We compared RNA sequences from human B cells of 27 individuals to the corresponding DNA sequences from the same individuals and uncovered more than 10,000 exonic sites where the RNA sequences do not match that of the DNA. All 12 possible categories of discordances were observed. These differences were nonrandom as many sites were found in multiple individuals and in different cell types, including primary skin cells and brain tissues. Using mass spectrometry, we detected peptides that are translated from the discordant RNA sequences and thus do not correspond exactly to the DNA sequences. These widespread RNA-DNA differences in the human transcriptome provide a yet unexplored aspect of genome variation.


Nature Genetics | 2006

CFH haplotypes without the Y402H coding variant show strong association with susceptibility to age-related macular degeneration

Mingyao Li; Pelin Atmaca-Sonmez; Mohammad Othman; Kari Branham; Ritu Khanna; Michael S Wade; Yun Li; Liming Liang; Sepideh Zareparsi; Anand Swaroop; Gonçalo R. Abecasis

In developed countries, age-related macular degeneration is a common cause of blindness in the elderly. A common polymorphism, encoding the sequence variation Y402H in complement factor H (CFH), has been strongly associated with disease susceptibility. Here, we examined 84 polymorphisms in and around CFH in 726 affected individuals (including 544 unrelated individuals) and 268 unrelated controls. In this sample, 20 of these polymorphisms showed stronger association with disease susceptibility than the Y402H variant. Further, no single polymorphism could account for the contribution of the CFH locus to disease susceptibility. Instead, multiple polymorphisms defined a set of four common haplotypes (of which two were associated with disease susceptibility and two seemed to be protective) and multiple rare haplotypes (associated with increased susceptibility in aggregate). Our results suggest that there are multiple disease susceptibility alleles in the region and that noncoding CFH variants play a role in disease susceptibility.


PubMed | 2011

Identification of ADAMTS7 as a novel locus for coronary atherosclerosis and association of ABO with myocardial infarction in the presence of coronary atherosclerosis: two genome-wide association studies.

M. P. Reilly; Mingyao Li; Jiang He; Jane F. Ferguson; Ioannis M. Stylianou; Nehal N. Mehta; Burnett; Joe Devaney; Christopher W. Knouff; Thompson; Benjamin D. Horne; Alexandre F.R. Stewart; Themistocles L. Assimes; Philipp S. Wild; Hooman Allayee; Pl Nitschke; Riyaz S. Patel; Nicola Martinelli; Domenico Girelli; Arshed A. Quyyumi; Jeffrey L. Anderson; J. Erdmann; A. S. Hall; Heribert Schunkert; Thomas Quertermous; Stefan Blankenberg; Stanley L. Hazen; Rebecca L. Roberts; Sekar Kathiresan; Nilesh J. Samani

BACKGROUNDnWe tested whether genetic factors distinctly contribute to either development of coronary atherosclerosis or, specifically, to myocardial infarction in existing coronary atherosclerosis.nnnMETHODSnWe did two genome-wide association studies (GWAS) with coronary angiographic phenotyping in participants of European ancestry. To identify loci that predispose to angiographic coronary artery disease (CAD), we compared individuals who had this disorder (n=12,393) with those who did not (controls, n=7383). To identify loci that predispose to myocardial infarction, we compared patients who had angiographic CAD and myocardial infarction (n=5783) with those who had angiographic CAD but no myocardial infarction (n=3644).nnnFINDINGSnIn the comparison of patients with angiographic CAD versus controls, we identified a novel locus, ADAMTS7 (p=4·98×10(-13)). In the comparison of patients with angiographic CAD who had myocardial infarction versus those with angiographic CAD but no myocardial infarction, we identified a novel association at the ABO locus (p=7·62×10(-9)). The ABO association was attributable to the glycotransferase-deficient enzyme that encodes the ABO blood group O phenotype previously proposed to protect against myocardial infarction.nnnINTERPRETATIONnOur findings indicate that specific genetic predispositions promote the development of coronary atherosclerosis whereas others lead to myocardial infarction in the presence of coronary atherosclerosis. The relation to specific CAD phenotypes might modify how novel loci are applied in personalised risk assessment and used in the development of novel therapies for CAD.nnnFUNDINGnThe PennCath and MedStar studies were supported by the Cardiovascular Institute of the University of Pennsylvania, by the MedStar Health Research Institute at Washington Hospital Center and by a research grant from GlaxoSmithKline. The funding and support for the other cohorts contributing to the paper are described in the webappendix.


Nucleic Acids Research | 2008

Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms

Sharon J. Diskin; Mingyao Li; Cuiping Hou; Shuzhang Yang; Joseph T. Glessner; Hakon Hakonarson; Maja Bucan; John M. Maris; Kai Wang

Whole-genome microarrays with large-insert clones designed to determine DNA copy number often show variation in hybridization intensity that is related to the genomic position of the clones. We found these ‘genomic waves’ to be present in Illumina and Affymetrix SNP genotyping arrays, confirming that they are not platform-specific. The causes of genomic waves are not well-understood, and they may prevent accurate inference of copy number variations (CNVs). By measuring DNA concentration for 1444 samples and by genotyping the same sample multiple times with varying DNA quantity, we demonstrated that DNA quantity correlates with the magnitude of waves. We further showed that wavy signal patterns correlate best with GC content, among multiple genomic features considered. To measure the magnitude of waves, we proposed a GC-wave factor (GCWF) measure, which is a reliable predictor of DNA quantity (correlation coefficient = 0.994 based on samples with serial dilution). Finally, we developed a computational approach by fitting regression models with GC content included as a predictor variable, and we show that this approach improves the accuracy of CNV detection. With the wide application of whole-genome SNP genotyping techniques, our wave adjustment method will be important for taking full advantage of genotyped samples for CNV analysis.


American Journal of Human Genetics | 2005

Joint modeling of linkage and association: identifying SNPs responsible for a linkage signal.

Mingyao Li; Michael Boehnke; Gonçalo R. Abecasis

Once genetic linkage has been identified for a complex disease, the next step is often association analysis, in which single-nucleotide polymorphisms (SNPs) within the linkage region are genotyped and tested for association with the disease. If a SNP shows evidence of association, it is useful to know whether the linkage result can be explained, in part or in full, by the candidate SNP. We propose a novel approach that quantifies the degree of linkage disequilibrium (LD) between the candidate SNP and the putative disease locus through joint modeling of linkage and association. We describe a simple likelihood of the marker data conditional on the trait data for a sample of affected sib pairs, with disease penetrances and disease-SNP haplotype frequencies as parameters. We estimate model parameters by maximum likelihood and propose two likelihood-ratio tests to characterize the relationship of the candidate SNP and the disease locus. The first test assesses whether the candidate SNP and the disease locus are in linkage equilibrium so that the SNP plays no causal role in the linkage signal. The second test assesses whether the candidate SNP and the disease locus are in complete LD so that the SNP or a marker in complete LD with it may account fully for the linkage signal. Our method also yields a genetic model that includes parameter estimates for disease-SNP haplotype frequencies and the degree of disease-SNP LD. Our method provides a new tool for detecting linkage and association and can be extended to study designs that include unaffected family members.


Genome Research | 2011

RNA-sequence analysis of human B-cells

Jonathan M. Toung; Michael Morley; Mingyao Li; Vivian G. Cheung

RNA-sequencing (RNA-seq) allows quantitative measurement of expression levels of genes and their transcripts. In this study, we sequenced complementary DNA fragments of cultured human B-cells and obtained 879 million 50-bp reads comprising 44 Gb of sequence. The results allowed us to study the gene expression profile of B-cells and to determine experimental parameters for sequencing-based expression studies. We identified 20,766 genes and 67,453 of their alternatively spliced transcripts. More than 90% of the genes with multiple exons are alternatively spliced; for most genes, one isoform is predominantly expressed. We found that while chromosomes differ in gene density, the percentage of transcribed genes in each chromosome is less variable. In addition, genes involved in related biological processes are expressed at more similar levels than genes with different functions. Besides characterizing gene expression, we also used the data to investigate the effect of sequencing depth on gene expression measurements. While 100 million reads are sufficient to detect most expressed genes and transcripts, about 500 million reads are needed to measure accurately their expression levels. We provide examples in which deep sequencing is needed to determine the relative abundance of genes and their isoforms. With data from 20 individuals and about 40 million sequence reads per sample, we uncovered only 21 alternatively spliced, multi-exon genes that are not in databases; this result suggests that at this sequence coverage, we can detect most of the known genes. Results from this project are available on the UCSC Genome Browser to allow readers to study the expression and structure of genes in human B-cells.


American Journal of Human Genetics | 2006

Efficient Study Designs for Test of Genetic Association Using Sibship Data and Unrelated Cases and Controls

Mingyao Li; Michael Boehnke; Gonçalo R. Abecasis

Linkage mapping of complex diseases is often followed by association studies between phenotypes and marker genotypes through use of case-control or family-based designs. Given fixed genotyping resources, it is important to know which study designs are the most efficient. To address this problem, we extended the likelihood-based method of Li et al., which assesses whether there is linkage disequilibrium between a disease locus and a SNP, to accommodate sibships of arbitrary size and disease-phenotype configuration. A key advantage of our method is the ability to combine data from different family structures. We consider scenarios for which genotypes are available for unrelated cases, affected sib pairs (ASPs), or only one sibling per ASP. We construct designs that use cases only and others that use unaffected siblings or unrelated unaffected individuals as controls. Different combinations of cases and controls result in seven study designs. We compare the efficiency of these designs when the number of individuals to be genotyped is fixed. Our results suggest that (1) when the disease is influenced by a single gene, the one sibling per ASP-control design is the most efficient, followed by the ASP-control design, and familial cases contribute more association information than singleton cases; (2) when the disease is influenced by multiple genes, familial cases provide more association information than singleton cases, unless the effect of the locus being tested is much smaller than at least one other untested disease locus; and (3) the case-control design can be useful for detecting genes with small effect in the presence of genes with much larger effect. Our findings will be helpful for researchers designing and analyzing complex disease-association studies and will facilitate genotyping resource allocation.


European Journal of Human Genetics | 2008

Evaluation of coverage variation of SNP chips for genome-wide association studies.

Mingyao Li; Chun Li; Weihua Guan

Genome-wide association (GWA) studies for complex human diseases are now feasible. Many GWA studies rely on commercial SNP chips, for which a common evaluation criterion is global coverage of the genome. Although providing an overall evaluation of an SNP chip, the global coverage does not tell us how the coverage varies across the genome, an important feature that should be taken into consideration, as coverage variation often results in power variation and potentially biased search in subsequent association analysis. To achieve a fuller understanding of SNP chip coverage, we conducted detailed evaluation of coverage, including (1) a map of local coverage – calculated over small consecutive genomic regions and (2) gene coverage – calculated for each known gene in the genome. These evaluations can reveal the degree of variation of each SNP chip in covering the genome and can facilitate SNP chip comparisons at a finer scale.

Collaboration


Dive into the Mingyao Li's collaboration.

Top Co-Authors

Avatar

Chun Li

Vanderbilt University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hakon Hakonarson

Children's Hospital of Philadelphia

View shared research outputs
Top Co-Authors

Avatar

Yun Li

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Daniel J. Rader

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Joseph T. Glessner

Children's Hospital of Philadelphia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Struan F. A. Grant

Children's Hospital of Philadelphia

View shared research outputs
Researchain Logo
Decentralizing Knowledge