Yuandan Lee
J. Craig Venter Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yuandan Lee.
Bioinformatics | 2003
Geo Pertea; Xiaoqiu Huang; Feng Liang; Valentin Antonescu; Razvan Sultana; Svetlana Karamycheva; Yuandan Lee; Joseph White; Foo Cheung; Babak Parvizi; Jennifer Tsai; John Quackenbush
TGICL is a pipeline for analysis of large Expressed Sequence Tags (EST) and mRNA databases in which the sequences are first clustered based on pairwise sequence similarity, and then assembled by individual clusters (optionally with quality values) to produce longer, more complete consensus sequences. The system can run on multi-CPU architectures including SMP and PVM.
Nucleic Acids Research | 2007
Shu Ouyang; Wei Zhu; John A. Hamilton; Haining Lin; Matthew Campbell; Kevin L. Childs; Françoise Thibaud-Nissen; Renae L. Malek; Yuandan Lee; Li Zheng; Joshua Orvis; Brian J. Haas; Jennifer R. Wortman; C. Robin Buell
In The Institute for Genomic Research Rice Genome Annotation project (), we have continued to update the rice genome sequence with new data and improve the quality of the annotation. In our current release of annotation (Release 4.0; January 12, 2006), we have identified 42 653 non-transposable element-related genes encoding 49 472 gene models as a result of the detection of alternative splicing. We have refined our identification methods for transposable element-related genes resulting in 13 237 genes that are related to transposable elements. Through incorporation of multiple transcript and proteomic expression data sets, we have been able to annotate 24 799 genes (31 739 gene models), representing ∼50% of the total gene models, as expressed in the rice genome. All structural and functional annotation is viewable through our Rice Genome Browser which currently supports 59 tracks. Enhanced data access is available through web interfaces, FTP downloads and a Data Extractor tool developed in order to support discrete dataset downloads.
Nucleic Acids Research | 2004
Yuandan Lee; Jennifer Tsai; Sirisha Sunkara; Svetlana Karamycheva; Geo Pertea; Razvan Sultana; Valentin Antonescu; Agnes P. Chan; Foo Cheung; John Quackenbush
Although the list of completed genome sequencing projects has expanded rapidly, sequencing and analysis of expressed sequence tags (ESTs) remain a primary tool for discovery of novel genes in many eukaryotes and a key element in genome annotation. The TIGR Gene Indices (http://www.tigr.org/tdb/tgi) are a collection of 77 species-specific databases that use a highly refined protocol to analyze gene and EST sequences in an attempt to identify and characterize expressed transcripts and to present them on the Web in a user-friendly, consistent fashion. A Gene Index database is constructed for each selected organism by first clustering, then assembling EST and annotated cDNA and gene sequences from GenBank. This process produces a set of unique, high-fidelity virtual transcripts, or tentative consensus (TC) sequences. The TC sequences can be used to provide putative genes with functional annotation, to link the transcripts to genetic and physical maps, to provide links to orthologous and paralogous genes, and as a resource for comparative and functional genomic analysis.
Plant Physiology | 2003
Catherine M. Ronning; Svetlana Stegalkina; Robert A. Ascenzi; Oleg Bougri; Amy L. Hart; Teresa R. Utterbach; Susan E. Vanaken; Steve B. Riedmuller; Joseph White; Jennifer Cho; Geo Pertea; Yuandan Lee; Svetlana Karamycheva; Razvan Sultana; Jennifer Tsai; John Quackenbush; H. M. Griffiths; Silvia Restrepo; Christine D. Smart; William E. Fry; Rutger Van der Hoeven; Steve Tanksley; Peifen Zhang; Hailing Jin; Miki L. Yamamoto; Barbara Baker; C. Robin Buell
The cultivated potato (Solanum tuberosum) shares similar biology with other members of the Solanaceae, yet has features unique within the family, such as modified stems (stolons) that develop into edible tubers. To better understand potato biology, we have undertaken a survey of the potato transcriptome using expressed sequence tags (ESTs) from diverse tissues. A total of 61,940 ESTs were generated from aerial tissues, below-ground tissues, and tissues challenged with the late-blight pathogen (Phytophthora infestans). Clustering and assembly of these ESTs resulted in a total of 19,892 unique sequences with 8,741 tentative consensus sequences and 11,151 singleton ESTs. We were able to identify a putative function for 43.7% of these sequences. A number of sequences (48) were expressed throughout the libraries sampled, representing constitutively expressed sequences. Other sequences (13,068, 21%) were uniquely expressed and were detected only in a single library. Using hierarchal and k means clustering of the EST sequences, we were able to correlate changes in gene expression with major physiological events in potato biology. Using pair-wise comparisons of tuber-related tissues, we were able to associate genes with tuber initiation, dormancy, and sprouting. We also were able to identify a number of characterized as well as novel sequences that were unique to the incompatible interaction of late-blight pathogen, thereby providing a foundation for further understanding the mechanism of resistance.
Cytogenetic and Genome Research | 2003
C.E. Rexroad; Yuandan Lee; J. W. Keele; Svetlana Karamycheva; G. Brown; B. Koop; S.A. Gahr; Y. Palti; John Quackenbush
Expressed sequence tag (EST) projects have produced extremely valuable resources for identifying genes affecting phenotypes of interest. A large-scale EST sequencing project for rainbow trout was initiated to identify and functionally annotate as many unique transcripts as possible. Over 45,000 5′ ESTs were obtained by sequencing clones from a single normalized library constructed using mRNA from six tissues. The production of this sequence data and creation of a rainbow trout Gene Index eliminating redundancy and providing annotation for these sequences will facilitate research in this species.
BMC Genomics | 2005
Willem Albert Rensink; Yuandan Lee; Jia Liu; Stacy Iobst; Shu Ouyang; C. Robin Buell
BackgroundThe Solanaceae is a family of closely related species with diverse phenotypes that have been exploited for agronomic purposes. Previous studies involving a small number of genes suggested sequence conservation across the Solanaceae. The availability of large collections of Expressed Sequence Tags (ESTs) for the Solanaceae now provides the opportunity to assess sequence conservation and divergence on a genomic scale.ResultsAll available ESTs and Expressed Transcripts (ETs), 449,224 sequences for six Solanaceae species (potato, tomato, pepper, petunia, tobacco and Nicotiana benthamiana), were clustered and assembled into gene indices. Examination of gene ontologies revealed that the transcripts within the gene indices encode a similar suite of biological processes. Although the ESTs and ETs were derived from a variety of tissues, 55–81% of the sequences had significant similarity at the nucleotide level with sequences among the six species. Putative orthologs could be identified for 28–58% of the sequences. This high degree of sequence conservation was supported by expression profiling using heterologous hybridizations to potato cDNA arrays that showed similar expression patterns in mature leaves for all six solanaceous species. 16–19% of the transcripts within the six Solanaceae gene indices did not have matches among Solanaceae, Arabidopsis, rice or 21 other plant gene indices.ConclusionResults from this genome scale analysis confirmed a high level of sequence conservation at the nucleotide level of the coding sequence among Solanaceae. Additionally, the results indicated that part of the Solanaceae transcriptome is likely to be unique for each species.
American Journal of Physiology-lung Cellular and Molecular Physiology | 1999
G T De Sanctis; Jonathan Singer; Aiping Jiao; Chandri N. Yandava; Yuandan Lee; T. C. Haynes; Eric S. Lander; David R. Beier; Jeffrey M. Drazen
Quantitative trait locus (QTL) mapping was used to identify chromosomal regions contributing to airway hyperresponsiveness in mice. Airway responsiveness to methacholine was measured in A/J and C3H/HeJ parental strains as well as in progeny derived from crosses between these strains. QTL mapping of backcross [(A/J × C3H/HeJ) × C3H/HeJ] progeny ( n = 137-227 informative mice for markers tested) revealed two significant linkages to loci on chromosomes 6 and 7. The QTL on chromosome 6 confirms the previous report by others of a linkage in this region in the same genetic backgrounds; the second QTL, on chromosome 7, represents a novel locus. In addition, we obtained suggestive evidence for linkage (logarithm of odds ratio = 1.7) on chromosome 17, which lies in the same region previously identified in a cross between A/J and C57BL/6J mice. Airway responsiveness in a cross between A/J and C3H/HeJ mice is under the control of at least two major genetic loci, with evidence for a third locus that has been previously implicated in an A/J and C57BL/6J cross; this indicates that multiple genetic factors control the expression of this phenotype.Quantitative trait locus (QTL) mapping was used to identify chromosomal regions contributing to airway hyperresponsiveness in mice. Airway responsiveness to methacholine was measured in A/J and C3H/HeJ parental strains as well as in progeny derived from crosses between these strains. QTL mapping of backcross [(A/J x C3H/HeJ) x C3H/HeJ] progeny (n = 137-227 informative mice for markers tested) revealed two significant linkages to loci on chromosomes 6 and 7. The QTL on chromosome 6 confirms the previous report by others of a linkage in this region in the same genetic backgrounds; the second QTL, on chromosome 7, represents a novel locus. In addition, we obtained suggestive evidence for linkage (logarithm of odds ratio = 1.7) on chromosome 17, which lies in the same region previously identified in a cross between A/J and C57BL/6J mice. Airway responsiveness in a cross between A/J and C3H/HeJ mice is under the control of at least two major genetic loci, with evidence for a third locus that has been previously implicated in an A/J and C57BL/6J cross; this indicates that multiple genetic factors control the expression of this phenotype.
Proceedings of the Fifth International Rice Genetics Symposium | 2007
Shu Ouyang; Wei Zhu; John P. Hamilton; Haining Lin; Matthew Campbell; Yuandan Lee; Rl Malek; Aihui Wang; Qiaoping Yuan; Brian J. Haas; Jennifer R. Wortman; C.R. Buell
A high-quality finished sequence of the rice genome was completed in 2005. However, to maximally use the sequences, quality annotation of the genes and genome features is necessary. The process of annotation is iterative in nature and requires the application and refinement of computational tools coupled with manual curation and evalutation. We are funded by the U.S. National Science Foundation to annotate the rice genome and have constructed pseudomolecules for the 12 Oryza sativa subspecies japonica var. Nipponbare chromosomes, which are publicly available through our project Web site (http://rice.tigr.org). We identified genes, gene models, and other annotation features in the rice genome. We expanded our annotation features to include a rice transcript assembly and its alignment with the rice genome, small noncoding RNAs, simple sequence repeats, as well as single nucleotide polymorphisms and insertions/deletions based on alignment with the indica subspecies. We updated our Oryza repeat database, which has allowed us to better quantify the repetitive sequences within the rice genome, which total 29% of the genome. To assist users in accessing the genome and our annotation, we expanded the content and functions of our Rice Genome Browser such that it supports 37 annotation tracks and data downloads of the underlying annotation data in various formats.
computational systems bioinformatics | 2004
Xiequn Xu; W.B. Barbazuk; Yucheng Feng; K. Schubert; Agnes P. Chan; Geo Pertea; Li Zheng; Foo Cheung; Yuandan Lee
The large size of the maize genome and the expectation that most of the genome is represented by repetitive elements is a challenge to standard genome sequencing techniques. Genome-filtration sequencing techniques may target gene-rich regions in the genome. Two such approaches, methylation filtration and high Cot selection, may provide rapid and cost-effective alternative to sequencing the maize genome. Approximately 450k sequence reads have been obtained from both methylation filtration (MF) and high Cot (HC) libraries, and these sequences have been clustered and assembled. An analysis was undertaken to examine whether MF and MC enrich for maize genes and target non-identical sequence space. Simple sequence repeat analysis and mapped marker coverage provide gauges to examine whether MF and HC sample identical sequence space. Marker hits also provide evidence that MF and HC enrich for unique genic portions of the genome. Finally, the identification of maize and other protein sequences in the MF and HC sequence sets can indicate the expected fraction and coverage of the maize gene space by MF and HC.
Science | 2003
C. A. Whitelaw; W. B. Barbazuk; Geo Pertea; Agnes P. Chan; Foo Cheung; Yuandan Lee; Li Zheng; S. van Heeringen; Svetlana Karamycheva; Jeffrey L. Bennetzen; Phillip SanMiguel; N. Lakey; J. Bedell; Yinan Yuan; M. A. Budiman; A. Resnick; S. van Aken; Terry Utterback; Steven Riedmuller; M. Williams; Tamara Feldblyum; K. Schubert; Roger N. Beachy; Claire M. Fraser; John Quackenbush