Taein Lee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Taein Lee is active.

Explore More

Publication

Featured researches published by Taein Lee.

Nucleic Acids Research | 2007

GDR (Genome Database for Rosaceae): integrated web-database for Rosaceae genomics and genetics data

Sook Jung; Margaret Staton; Taein Lee; Anna Blenda; Randall Svancara; Albert G. Abbott; Dorrie Main

The Genome Database for Rosaceae (GDR) is a central repository of curated and integrated genetics and genomics data of Rosaceae, an economically important family which includes apple, cherry, peach, pear, raspberry, rose and strawberry. GDR contains annotated databases of all publicly available Rosaceae ESTs, the genetically anchored peach physical map, Rosaceae genetic maps and comprehensively annotated markers and traits. The ESTs are assembled to produce unigene sets of each genus and the entire Rosaceae. Other annotations include putative function, microsatellites, open reading frames, single nucleotide polymorphisms, gene ontology terms and anchored map position where applicable. Most of the published Rosaceae genetic maps can be viewed and compared through CMap, the comparative map viewer. The peach physical map can be viewed using WebFPC/WebChrom, and also through our integrated GDR map viewer, which serves as a portal to the combined genetic, transcriptome and physical mapping information. ESTs, BACs, markers and traits can be queried by various categories and the search result sites are linked to the mapping visualization tools. GDR also provides online analysis tools such as a batch BLAST/FASTA server for the GDR datasets, a sequence assembly server and microsatellite and primer detection tools. GDR is available at http://www.rosaceae.org.

Nucleic Acids Research | 2014

The Genome Database for Rosaceae (GDR): year 10 update

Sook Jung; Stephen P. Ficklin; Taein Lee; Chun-Huai Cheng; Anna Blenda; Ping Zheng; Jing Yu; Aureliano Bombarely; Il-Hyung Cho; Sushan Ru; Kate Evans; Cameron Peace; Albert G. Abbott; Lukas A. Mueller; Mercy A. Olmstead; Dorrie Main

The Genome Database for Rosaceae (GDR, http:/www.rosaceae.org), the long-standing central repository and data mining resource for Rosaceae research, has been enhanced with new genomic, genetic and breeding data, and improved functionality. Whole genome sequences of apple, peach and strawberry are available to browse or download with a range of annotations, including gene model predictions, aligned transcripts, repetitive elements, polymorphisms, mapped genetic markers, mapped NCBI Rosaceae genes, gene homologs and association of InterPro protein domains, GO terms and Kyoto Encyclopedia of Genes and Genomes pathway terms. Annotated sequences can be queried using search interfaces and visualized using GBrowse. New expressed sequence tag unigene sets are available for major genera, and Pathway data are available through FragariaCyc, AppleCyc and PeachCyc databases. Synteny among the three sequenced genomes can be viewed using GBrowse_Syn. New markers, genetic maps and extensively curated qualitative/Mendelian and quantitative trait loci are available. Phenotype and genotype data from breeding projects and genetic diversity projects are also included. Improved search pages are available for marker, trait locus, genetic diversity and publication data. New search tools for breeders enable selection comparison and assistance with breeding decision making.

Nucleic Acids Research | 2014

CottonGen: a genomics, genetics and breeding database for cotton research

Jing Yu; Sook Jung; Chun-Huai Cheng; Stephen P. Ficklin; Taein Lee; Ping Zheng; Don Jones; Richard G. Percy; Dorrie Main

CottonGen (http://www.cottongen.org) is a curated and integrated web-based relational database providing access to publicly available genomic, genetic and breeding data for cotton. CottonGen supercedes CottonDB and the Cotton Marker Database, with enhanced tools for easier data sharing, mining, visualization and data retrieval of cotton research data. CottonGen contains annotated whole genome sequences, unigenes from expressed sequence tags (ESTs), markers, trait loci, genetic maps, genes, taxonomy, germplasm, publications and communication resources for the cotton community. Annotated whole genome sequences of Gossypium raimondii are available with aligned genetic markers and transcripts. These whole genome data can be accessed through genome pages, search tools and GBrowse, a popular genome browser. Most of the published cotton genetic maps can be viewed and compared using CMap, a comparative map viewer, and are searchable via map search tools. Search tools also exist for markers, quantitative trait loci (QTLs), germplasm, publications and trait evaluation data. CottonGen also provides online analysis tools such as NCBI BLAST and Batch BLAST.

BMC Genomics | 2009

Synteny of Prunus and other model plant species

Sook Jung; Derick Jiwan; Il-Hyung Cho; Taein Lee; A. G. Abbott; Bryon Sosinski; Dorrie Main

BackgroundFragmentary conservation of synteny has been reported between map-anchored Prunus sequences and Arabidopsis. With the availability of genome sequence for fellow rosid I members Populus and Medicago, we analyzed the synteny between Prunus and the three model genomes. Eight Prunus BAC sequences and map-anchored Prunus sequences were used in the comparison.ResultsWe found a well conserved synteny across the Prunus species – peach, plum, and apricot – and Populus using a set of homologous Prunus BACs. Conversely, we could not detect any synteny with Arabidopsis in this region. Other peach BACs also showed extensive synteny with Populus. The syntenic regions detected were up to 477 kb in Populus. Two syntenic regions between Arabidopsis and these BACs were much shorter, around 10 kb. We also found syntenic regions that are conserved between the Prunus BACs and Medicago. The array of synteny corresponded with the proposed whole genome duplication events in Populus and Medicago. Using map-anchored Prunus sequences, we detected many syntenic blocks with several gene pairs between Prunus and Populus or Arabidopsis. We observed a more complex network of synteny between Prunus-Arabidopsis, indicative of multiple genome duplication and subsequence gene loss in Arabidopsis.ConclusionOur result shows the striking microsynteny between the Prunus BACs and the genome of Populus and Medicago. In macrosynteny analysis, more distinct Prunus regions were syntenic to Populus than to Arabidopsis.

Database | 2011

Tripal: a construction toolkit for online genome databases

Stephen P. Ficklin; Lacey-Anne Sanderson; Chun-Huai Cheng; Margaret Staton; Taein Lee; Il-Hyung Cho; Sook Jung; Kirstin E. Bett; Doreen Main

As the availability, affordability and magnitude of genomics and genetics research increases so does the need to provide online access to resulting data and analyses. Availability of a tailored online database is the desire for many investigators or research communities; however, managing the Information Technology infrastructure needed to create such a database can be an undesired distraction from primary research or potentially cost prohibitive. Tripal provides simplified site development by merging the power of Drupal, a popular web Content Management System with that of Chado, a community-derived database schema for storage of genomic, genetic and other related biological data. Tripal provides an interface that extends the content management features of Drupal to the data housed in Chado. Furthermore, Tripal provides a web-based Chado installer, genomic data loaders, web-based editing of data for organisms, genomic features, biological libraries, controlled vocabularies and stock collections. Also available are Tripal extensions that support loading and visualizations of NCBI BLAST, InterPro, Kyoto Encyclopedia of Genes and Genomes and Gene Ontology analyses, as well as an extension that provides integration of Tripal with GBrowse, a popular GMOD tool. An Application Programming Interface is available to allow creation of custom extensions by site developers, and the look-and-feel of the site is completely customizable through Drupal-based PHP template files. Addition of non-biological content and user-management is afforded through Drupal. Tripal is an open source and freely available software package found at http://tripal.sourceforge.net

Database | 2011

The Chado Natural Diversity module: a new generic database schema for large-scale phenotyping and genotyping data

Sook Jung; Naama Menda; Seth Redmond; Robert M. Buels; Maren L. Friesen; Yuri R. Bendaña; Lacey-Anne Sanderson; Hilmar Lapp; Taein Lee; Bob MacCallum; Kirstin E. Bett; Scott Cain; Dave Clements; Lukas A. Mueller; Dorrie Main

Linking phenotypic with genotypic diversity has become a major requirement for basic and applied genome-centric biological research. To meet this need, a comprehensive database backend for efficiently storing, querying and analyzing large experimental data sets is necessary. Chado, a generic, modular, community-based database schema is widely used in the biological community to store information associated with genome sequence data. To meet the need to also accommodate large-scale phenotyping and genotyping projects, a new Chado module called Natural Diversity has been developed. The module strictly adheres to the Chado remit of being generic and ontology driven. The flexibility of the new module is demonstrated in its capacity to store any type of experiment that either uses or generates specimens or stock organisms. Experiments may be grouped or structured hierarchically, whereas any kind of biological entity can be stored as the observed unit, from a specimen to be used in genotyping or phenotyping experiments, to a group of species collected in the field that will undergo further lab analysis. We describe details of the Natural Diversity module, including the design approach, the relational schema and use cases implemented in several databases.

Standards in Genomic Sciences | 2011

Complete genome of the onion pathogen Enterobacter cloacae EcWSU1

Jodi L. Humann; Mark R. Wildung; Chun-Huai Cheng; Taein Lee; Jane E. Stewart; Jennifer C. Drew; Eric W. Triplett; Doreen Main; Brenda K. Schroeder

Previous studies have shown that the members of the Enterobacter cloacae complex are difficult to differentiate with biochemical tests and in phylogenetic studies using multilocus sequence analysis, strains of the same species separate into numerous clusters. There are only a few complete E. cloacae genome sequences and very little knowledge about the mechanism of pathogenesis of E. cloacae on plants and humans. Enterobacter cloacae EcWSU1 causes Enterobacter bulb decay in stored onions (Allium cepa). The EcWSU1 genome consists of a 4,734,438 bp chromosome and a mega-plasmid of 63,653 bp. The chromosome has 4,632 protein coding regions, 83 tRNA sequences, and 8 rRNA operons.

Tree Genetics & Genomes | 2012

Uniform standards for genome databases in forest and fruit trees

Jill L. Wegrzyn; Doreen Main; B. Figueroa; M. Choi; J. Yu; David B. Neale; Sook Jung; Taein Lee; M. Stanton; Ping Zheng; Stephen P. Ficklin; Il-Hyung Cho; Cameron Peace; Kate Evans; Gayle M. Volk; Nnadozie Oraguzie; Chunxian Chen; Mercy A. Olmstead; G. Gmitter; A. G. Abbott

TreeGenes and tree fruit Genome Database Resources serve the international forestry and fruit tree genomics research communities, respectively. These databases hold similar sequence data and provide resources for the submission and recovery of this information in order to enable comparative genomics research. Large-scale genotype and phenotype projects have recently spawned the development of independent tools and interfaces within these repositories to deliver information to both geneticists and breeders. The increase in next generation sequencing projects has increased the amount of data as well as the scale of analysis that can be performed. These two repositories are now working towards a similar goal of archiving the diverse, independent data sets generated from genotype/phenotype experiments. This is achieved through focused development on data input standards (templates), pipelines for the storage and automated curation, and consistent annotation efforts through the application of widely accepted ontologies to improve the extraction and exchange of the data for comparative analysis. Efforts towards standardization are not limited to genotype/phenotype experiments but are also being applied to other data types to improve gene prediction and annotation for de novo sequencing projects. The resources developed towards these goals represent the first large-scale coordinated effort in plant databases to add informatics value to diverse genotype/phenotype experiments.

Database | 2013

Addition of a breeding database in the Genome Database for Rosaceae

Kate Evans; Sook Jung; Taein Lee; Lisa J. Brutcher; Il-Hyung Cho; Cameron Peace; Dorrie Main

Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will further accelerate the cross-utilization of diverse data types by researchers from various disciplines. Database URL: http://www.rosaceae.org/breeders_toolbox

Database | 2016

Chado use case: storing genomic, genetic and breeding data of Rosaceae and Gossypium crops in Chado

Sook Jung; Taein Lee; Stephen P. Ficklin; Jing Yu; Chun-Huai Cheng; Dorrie Main

The Genome Database for Rosaceae (GDR) and CottonGen are comprehensive online data repositories that provide access to integrated genomic, genetic and breeding data through search, visualization and analysis tools for Rosaceae crops and Gossypium (cotton). These online databases use Chado, an open-source, generic and ontology-driven database schema for biological data, as the primary data storage platform. Chado is highly normalized and uses ontologies to indicate the ‘types’ of data. Therefore, Chado is flexible such that it has been used to house genomic, genetic and breeding data for GDR and CottonGen. These data include whole genome sequence and annotation, transcripts, molecular markers, genetic maps, Quantitative Trait Loci, Mendelian Trait Loci, traits, germplasm, pedigrees, large scale phenotypic and genotypic data, ontologies and publications. We provide information about how to store these types of data in Chado using GDR and CottonGen as examples sites that were converted from an older legacy infrastructure. Database URL: GDR (www.rosaceae.org), CottonGen (www.cottongen.org)

Explore More