Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yuk Yee Leung is active.

Publication


Featured researches published by Yuk Yee Leung.


RNA | 2013

HAMR: high-throughput annotation of modified ribonucleotides

Paul Ryvkin; Yuk Yee Leung; Ian M. Silverman; Micah Childress; Otto Valladares; Isabelle Dragomir; Brian D. Gregory; Li-San Wang

RNA is often altered post-transcriptionally by the covalent modification of particular nucleotides; these modifications are known to modulate the structure and activity of their host RNAs. The recent discovery that an RNA methyl-6 adenosine demethylase (FTO) is a risk gene in obesity has brought to light the significance of RNA modifications to human biology. These noncanonical nucleotides, when converted to cDNA in the course of RNA sequencing, can produce sequence patterns that are distinguishable from simple base-calling errors. To determine whether these modifications can be detected in RNA sequencing data, we developed a method that can not only locate these modifications transcriptome-wide with single nucleotide resolution, but can also differentiate between different classes of modifications. Using small RNA-seq data we were able to detect 92% of all known human tRNA modification sites that are predicted to affect RT activity. We also found that different modifications produce distinct patterns of cDNA sequence, allowing us to differentiate between two classes of adenosine and two classes of guanine modifications with 98% and 79% accuracy, respectively. To show the robustness of this method to sample preparation and sequencing methods, as well as to organismal diversity, we applied it to a publicly available yeast data set and achieved similar levels of accuracy. We also experimentally validated two novel and one known 3-methylcytosine (3mC) sites predicted by HAMR in human tRNAs. Researchers can now use our method to identify and characterize RNA modifications using only RNA-seq data, both retrospectively and when asking questions specifically about modified RNA.


Journal of Alzheimer's Disease | 2012

Comparison of xMAP and ELISA Assays for Detecting Cerebrospinal Fluid Biomarkers of Alzheimer's Disease

Li-San Wang; Yuk Yee Leung; Shu-Kai Chang; Susan Leight; Malgorzata Knapik-Czajka; Young Min Baek; Leslie M. Shaw; Virginia M.-Y. Lee; John Q. Trojanowski; Christopher M. Clark

The best-studied biomarkers of Alzheimers disease (AD) are the pathologically-linked cerebrospinal fluid (CSF) proteins amyloid-β 42 (Aβ(1-42)), total tau (t-tau), and tau phosphorylated on amino acid 181 (p-tau(181)). Many laboratories measure these proteins using enzyme-linked immunosorbent assay (ELISA). Multiplex xMAP Luminex is a semi-automated assay platform with reduced intra-sample variance, which could facilitate its use in CLIA-approved clinical laboratories. CSF concentrations of these three biomarkers reported using xMAP technology differ from those measured by the most commonly used ELISA, confounding attempts to compare results. To develop a model for converting between xMAP and ELISA levels of the three biomarkers, we analyzed CSF samples from 140 subjects (59 AD, 30 controls, 34 with mild cognitive impairment, and 17 with Parkinsons disease, including 1 with dementia). Log-transformation of ELISA and xMAP levels made the variance constant in all three biomarkers and improved the linear regression: t-tau concentrations were highly correlated (r = 0.94); p-tau(181) concentrations by ELISA can be better predicted using both the t-tau and p-tau(181) xMAP values (r = 0.96) as compared to p-tau(181) concentrations alone (r = 0.82); correlation of Aβ(1-42) concentrations was relatively weaker but still high (r = 0.77). Among all six protein/assay combinations, xMAP Aβ(1-42) had the best accuracy for diagnostic classification (88%) between AD and control subjects. In conclusion, our study demonstrates that multiplex xMAP is an appropriate assay platform providing results that can be correlated with research-based ELISA values, facilitating the incorporation of this diagnostic biomarker into routine clinical practice.


Nucleic Acids Research | 2016

DASHR: database of small human noncoding RNAs

Yuk Yee Leung; Pavel P. Kuksa; Alexandre Amlie-Wolf; Otto Valladares; Lyle H. Ungar; Sampath Kannan; Brian D. Gregory; Li-San Wang

Small non-coding RNAs (sncRNAs) are highly abundant RNAs, typically <100 nucleotides long, that act as key regulators of diverse cellular processes. Although thousands of sncRNA genes are known to exist in the human genome, no single database provides searchable, unified annotation, and expression information for full sncRNA transcripts and mature RNA products derived from these larger RNAs. Here, we present the Database of small human noncoding RNAs (DASHR). DASHR contains the most comprehensive information to date on human sncRNA genes and mature sncRNA products. DASHR provides a simple user interface for researchers to view sequence and secondary structure, compare expression levels, and evidence of specific processing across all sncRNA genes and mature sncRNA products in various human tissues. DASHR annotation and expression data covers all major classes of sncRNAs including microRNAs (miRNAs), Piwi-interacting (piRNAs), small nuclear, nucleolar, cytoplasmic (sn-, sno-, scRNAs, respectively), transfer (tRNAs), and ribosomal RNAs (rRNAs). Currently, DASHR (v1.0) integrates 187 smRNA high-throughput sequencing (smRNA-seq) datasets with over 2.5 billion reads and annotation data from multiple public sources. DASHR contains annotations for ∼48 000 human sncRNA genes and mature sncRNA products, 82% of which are expressed in one or more of the curated tissues. DASHR is available at http://lisanwanglab.org/DASHR.


Nucleic Acids Research | 2013

CoRAL: predicting non-coding RNAs from small RNA-sequencing data

Yuk Yee Leung; Paul Ryvkin; Lyle H. Ungar; Brian D. Gregory; Li-San Wang

The surprising observation that virtually the entire human genome is transcribed means we know little about the function of many emerging classes of RNAs, except their astounding diversities. Traditional RNA function prediction methods rely on sequence or alignment information, which are limited in their abilities to classify the various collections of non-coding RNAs (ncRNAs). To address this, we developed Classification of RNAs by Analysis of Length (CoRAL), a machine learning-based approach for classification of RNA molecules. CoRAL uses biologically interpretable features including fragment length and cleavage specificity to distinguish between different ncRNA populations. We evaluated CoRAL using genome-wide small RNA sequencing data sets from four human tissue types and were able to classify six different types of RNAs with ∼80% cross-validation accuracy. Analysis by CoRAL revealed that microRNAs, small nucleolar and transposon-derived RNAs are highly discernible and consistent across all human tissue types assessed, whereas long intergenic ncRNAs, small cytoplasmic RNAs and small nuclear RNAs show less consistent patterns. The ability to reliably annotate loci across tissue types demonstrates the potential of CoRAL to characterize ncRNAs using small RNA sequencing data in less well-characterized organisms.


Methods | 2014

Using machine learning and high-throughput RNA sequencing to classify the precursors of small non-coding RNAs.

Paul Ryvkin; Yuk Yee Leung; Lyle H. Ungar; Brian D. Gregory; Li-San Wang

Recent advances in high-throughput sequencing allow researchers to examine the transcriptome in more detail than ever before. Using a method known as high-throughput small RNA-sequencing, we can now profile the expression of small regulatory RNAs such as microRNAs and small interfering RNAs (siRNAs) with a great deal of sensitivity. However, there are many other types of small RNAs (<50nt) present in the cell, including fragments derived from snoRNAs (small nucleolar RNAs), snRNAs (small nuclear RNAs), scRNAs (small cytoplasmic RNAs), tRNAs (transfer RNAs), and transposon-derived RNAs. Here, we present a users guide for CoRAL (Classification of RNAs by Analysis of Length), a computational method for discriminating between different classes of RNA using high-throughput small RNA-sequencing data. Not only can CoRAL distinguish between RNA classes with high accuracy, but it also uses features that are relevant to small RNA biogenesis pathways. By doing so, CoRAL can give biologists a glimpse into the characteristics of different RNA processing pathways and how these might differ between tissue types, biological conditions, or even different species. CoRAL is available at http://wanglab.pcbi.upenn.edu/coral/.


Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring | 2015

Identifying amyloid pathology–related cerebrospinal fluid biomarkers for Alzheimer's disease in a multicohort study

Yuk Yee Leung; Jon B. Toledo; Alexey Nefedov; Robi Polikar; Nandini Raghavan; Sharon X. Xie; Michael Farnum; Tim Schultz; Young Min Baek; Vivianna M. Van Deerlin; William T. Hu; David M. Holtzman; Anne M. Fagan; Richard J. Perrin; Murray Grossman; Holly Soares; Mitchel A. Kling; Matthew Mailman; Steven E. Arnold; Vaibhav A. Narayan; Virginia M.-Y. Lee; Leslie M. Shaw; David Baker; Gayle Wittenberg; John Q. Trojanowski; Li-San Wang

The dynamic range of cerebrospinal fluid (CSF) amyloid β (Aβ1–42) measurement does not parallel to cognitive changes in Alzheimers disease (AD) and cognitively normal (CN) subjects across different studies. Therefore, identifying novel proteins to characterize symptomatic AD samples is important.


Archive | 2017

In Silico Identification of RNA Modifications from High-Throughput Sequencing Data Using HAMR

Pavel P. Kuksa; Yuk Yee Leung; Lee E. Vandivier; Zachary Anderson; Brian D. Gregory; Li-San Wang

RNA molecules are often altered post-transcriptionally by the covalent modification of their nucleotides. These modifications are known to modulate the structure, function, and activity of RNAs. When reverse transcribed into cDNA during RNA sequencing library preparation, atypical (modified) ribonucleotides that affect Watson-Crick base pairing will interfere with reverse transcriptase (RT), resulting in cDNA products with mis-incorporated bases or prematurely terminated RNA products. These interactions with RT can therefore be inferred from mismatch patterns in the sequencing reads, and are distinguishable from simple base-calling errors, single-nucleotide polymorphisms (SNPs), or RNA editing sites. Here, we describe a computational protocol for the in silico identification of modified ribonucleotides from RT-based RNA-seq read-out using the High-throughput Analysis of Modified Ribonucleotides (HAMR) software. HAMR can identify these modifications transcriptome-wide with single nucleotide resolution, and also differentiate between different types of modifications to predict modification identity. Researchers can use HAMR to identify and characterize RNA modifications using RNA-seq data from a variety of common RT-based sequencing protocols such as Poly(A), total RNA-seq, and small RNA-seq.


Bioinformatics | 2018

Functional annotation of genomic variants in studies of late-onset Alzheimer’s disease

Mariusz Butkiewicz; Elizabeth E. Blue; Yuk Yee Leung; Xueqiu Jian; Edoardo Marcora; Alan E. Renton; Amanda Kuzma; Li-San Wang; Daniel C. Koboldt; Jonathan L. Haines; William S. Bush

Abstract Motivation Annotation of genomic variants is an increasingly important and complex part of the analysis of sequence-based genomic analyses. Computational predictions of variant function are routinely incorporated into gene-based analyses of rare-variants, though to date most studies use limited information for assessing variant function that is often agnostic of the disease being studied. Results In this work, we outline an annotation process motivated by the Alzheimer’s Disease Sequencing Project, illustrate the impact of including tissue-specific transcript sets and sources of gene regulatory information and assess the potential impact of changing genomic builds on the annotation process. While these factors only impact a small proportion of total variant annotations (∼5%), they influence the potential analysis of a large fraction of genes (∼25%). Availability and implementation Individual variant annotations are available via the NIAGADS GenomicsDB, at https://www.niagads.org/genomics/ tools-and-software/databases/genomics-database. Annotations are also available for bulk download at https://www.niagads.org/datasets. Annotation processing software is available at http://www.icompbio.net/resources/software-and-downloads/. Supplementary information Supplementary data are available at Bioinformatics online.


international conference on computational advances in bio and medical sciences | 2011

Invited: Multiclass RNA function classification using next-generation sequencing

Paul Ryvkin; Yuk Yee Leung; Li-San Wang; Brian D. Gregory

RNA-seq produces detailed information including length, strand and pairing states, which can be leveraged to characterize RNA functional categories using machine-learning approaches. Using fruit fly small-RNA-seq data, we demonstrate that by combining read length correlation with multi-class classifier models, we can classify four non-coding RNA function classes with high precision.


bioRxiv | 2018

Inferring the molecular mechanisms of noncoding Alzheimer's disease-associated genetic variants

Alexandre Amlie-Wolf; Mitchell Tang; Jessica Way; Beth A. Dombroski; Ming Jiang; Nicholas Vrettos; Yi-Fan Chou; Yi Zhao; Amanda Kuzma; Elisabeth E. Mlynarski; Yuk Yee Leung; Christopher D. Brown; Li-San Wang; Gerard D. Schellenberg

INTRODUCTION We set out to characterize the causal variants, regulatory mechanisms, tissue contexts, and target genes underlying noncoding late-onset Alzheimer’s Disease (LOAD)-associated genetic signals. METHODS We applied our INFERNO method to the IGAP genome-wide association study (GWAS) data, annotating all potentially causal variants with tissue-specific regulatory activity. Bayesian co-localization analysis of GWAS summary statistics and eQTL data was performed to identify tissue-specific target genes. RESULTS INFERNO identified enhancer dysregulation in all 19 tag regions analyzed, significant enrichments of enhancer overlaps in the immune-related blood category, and co-localized eQTL signals overlapping enhancers from the matching tissue class in ten regions (ABCA7, BIN1, CASS4, CD2AP, CD33, CELF1, CLU, EPHA1, FERMT2, ZCWPW1). We validated the allele-specific effects of several variants on enhancer function using luciferase expression assays. DISCUSSION Integrating functional genomics with GWAS signals yielded insights into the regulatory mechanisms, tissue contexts, and genes affected by noncoding genetic variation associated with LOAD risk.

Collaboration


Dive into the Yuk Yee Leung's collaboration.

Top Co-Authors

Avatar

Li-San Wang

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Otto Valladares

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Amanda Kuzma

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Brian D. Gregory

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Leslie M. Shaw

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Liming Qu

University of Pennsylvania

View shared research outputs
Researchain Logo
Decentralizing Knowledge