Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Peifen Zhang is active.

Publication


Featured researches published by Peifen Zhang.


Nucleic Acids Research | 2007

The Arabidopsis Information Resource (TAIR): gene structure and function annotation

David Swarbreck; Christopher Wilks; Philippe Lamesch; Tanya Z. Berardini; Margarita Garcia-Hernandez; Hartmut Foerster; Donghui Li; Tom Meyer; Robert J. Muller; Larry Ploetz; Amie Radenbaugh; Shanker Singh; Vanessa Swing; Christophe Tissier; Peifen Zhang; Eva Huala

The Arabidopsis Information Resource (TAIR, http://arabidopsis.org) is the model organism database for the fully sequenced and intensively studied model plant Arabidopsis thaliana. Data in TAIR is derived in large part from manual curation of the Arabidopsis research literature and direct submissions from the research community. New developments at TAIR include the addition of the GBrowse genome viewer to the TAIR site, a redesigned home page, navigation structure and portal pages to make the site more intuitive and easier to use, the launch of several TAIR web services and a new genome annotation release (TAIR7) in April 2007. A combination of manual and computational methods were used to generate this release, which contains 27 029 protein-coding genes, 3889 pseudogenes or transposable elements and 1123 ncRNAs (32 041 genes in all, 37 019 gene models). A total of 681 new genes and 1002 new splice variants were added. Overall, 10 098 loci (one-third of all loci from the previous TAIR6 release) were updated for the TAIR7 release.


Nucleic Acids Research | 2003

The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community

Seung Y. Rhee; William D. Beavis; Tanya Z. Berardini; Guanghong Chen; David A. Dixon; Aisling Doyle; Margarita Garcia-Hernandez; Eva Huala; Gabriel C. Lander; Mary Montoya; Neil Miller; Lukas A. Mueller; Suparna Mundodi; Leonore Reiser; Julie Tacklind; Dan C. Weems; Yihe Wu; Iris Xu; Daniel Yoo; Jungwon Yoon; Peifen Zhang

Arabidopsis thaliana is the most widely-studied plant today. The concerted efforts of over 11 000 researchers and 4000 organizations around the world are generating a rich diversity and quantity of information and materials. This information is made available through a comprehensive on-line resource called the Arabidopsis Information Resource (TAIR) (http://arabidopsis.org), which is accessible via commonly used web browsers and can be searched and downloaded in a number of ways. In the last two years, efforts have been focused on increasing data content and diversity, functionally annotating genes and gene products with controlled vocabularies, and improving data retrieval, analysis and visualization tools. New information include sequence polymorphisms including alleles, germplasms and phenotypes, Gene Ontology annotations, gene families, protein information, metabolic pathways, gene expression data from microarray experiments and seed and DNA stocks. New data visualization and analysis tools include SeqViewer, which interactively displays the genome from the whole chromosome down to 10 kb of nucleotide sequence and AraCyc, a metabolic pathway database and map tool that allows overlaying expression data onto the pathway diagrams. Finally, we have recently incorporated seed and DNA stock information from the Arabidopsis Biological Resource Center (ABRC) and implemented a shopping-cart style on-line ordering system.


Nucleic Acids Research | 2004

MetaCyc: a multiorganism database of metabolic pathways and enzymes.

Cynthia J. Krieger; Peifen Zhang; Lukas A. Mueller; Alfred Wang; Suzanne M. Paley; Martha Arnaud; John Pick; Seung Y. Rhee; Peter D. Karp

The MetaCyc database (see URL http://MetaCyc.org) is a collection of metabolic pathways and enzymes from a wide variety of organisms, primarily microorganisms and plants. The goal of MetaCyc is to contain a representative sample of each experimentally elucidated pathway, and thereby to catalog the universe of metabolism. MetaCyc also describes reactions, chemical compounds and genes. Many of the pathways and enzymes in MetaCyc contain extensive information, including comments and literature citations. SRIs Pathway Tools software supports querying, visualization and curation of MetaCyc. With its wide breadth and depth of metabolic information, MetaCyc is a valuable resource for a variety of applications. MetaCyc is the reference database of pathways and enzymes that is used in conjunction with SRIs metabolic pathway prediction program to create Pathway/Genome Databases that can be augmented with curation from the scientific literature and published on the world wide web. MetaCyc also serves as a readily accessible comprehensive resource on microbial and plant pathways for genome analysis, basic research, education, metabolic engineering and systems biology. In the past 2 years the data content and the Pathway Tools software used to query, visualize and edit MetaCyc have been expanded significantly. These enhancements are described in this paper.


Plant Physiology | 2004

Functional Annotation of the Arabidopsis Genome Using Controlled Vocabularies

Tanya Z. Berardini; Suparna Mundodi; Leonore Reiser; Eva Huala; Margarita Garcia-Hernandez; Peifen Zhang; Lukas A. Mueller; Jungwoon Yoon; Aisling Doyle; Gabriel C. Lander; Nick Moseyko; Danny Yoo; Iris Xu; Brandon Zoeckler; Mary Montoya; Neil Miller; Dan C. Weems; Seung Y. Rhee

Controlled vocabularies are increasingly used by databases to describe genes and gene products because they facilitate identification of similar genes within an organism or among different organisms. One of The Arabidopsis Information Resources goals is to associate all Arabidopsis genes with terms developed by the Gene Ontology Consortium that describe the molecular function, biological process, and subcellular location of a gene product. We have also developed terms describing Arabidopsis anatomy and developmental stages and use these to annotate published gene expression data. As of March 2004, we used computational and manual annotation methods to make 85,666 annotations representing 26,624 unique loci. We focus on associating genes to controlled vocabulary terms based on experimental data from the literature and use The Arabidopsis Information Resource-developed PubSearch software to facilitate this process. Each annotation is tagged with a combination of evidence codes, evidence descriptions, and references that provide a robust means to assess data quality. Annotation of all Arabidopsis genes will allow quantitative comparisons between sets of genes derived from sources such as microarray experiments. The Arabidopsis annotation data will also facilitate annotation of newly sequenced plant genomes by using sequence similarity to transfer annotations to homologous genes. In addition, complete and up-to-date annotations will make unknown genes easy to identify and target for experimentation. Here, we describe the process of Arabidopsis functional annotation using a variety of data sources and illustrate several ways in which this information can be accessed and used to infer knowledge about Arabidopsis and other plant species.


Plant Physiology | 2005

MetaCyc and AraCyc. Metabolic Pathway Databases for Plant Research

Peifen Zhang; Hartmut Foerster; Christophe Tissier; Lukas A. Mueller; Suzanne M. Paley; Peter D. Karp; Seung Y. Rhee

MetaCyc (http://metacyc.org) contains experimentally determined biochemical pathways to be used as a reference database for metabolism. In conjunction with the Pathway Tools software, MetaCyc can be used to computationally predict the metabolic pathway complement of an annotated genome. To increase the breadth of pathways and enzymes, more than 60 plant-specific pathways have been added or updated in MetaCyc recently. In contrast to MetaCyc, which contains metabolic data for a wide range of organisms, AraCyc is a species-specific database containing only enzymes and pathways found in the model plant Arabidopsis (Arabidopsis thaliana). AraCyc (http://arabidopsis.org/tools/aracyc/) was the first computationally predicted plant metabolism database derived from MetaCyc. Since its initial computational build, AraCyc has been under continued curation to enhance data quality and to increase breadth of pathway coverage. Twenty-eight pathways have been manually curated from the literature recently. Pathway predictions in AraCyc have also been recently updated with the latest functional annotations of Arabidopsis genes that use controlled vocabulary and literature evidence. AraCyc currently features 1,418 unique genes mapped onto 204 pathways with 1,156 literature citations. The Omics Viewer, a user data visualization and analysis tool, allows a list of genes, enzymes, or metabolites with experimental values to be painted on a diagram of the full pathway map of AraCyc. Other recent enhancements to both MetaCyc and AraCyc include implementation of an evidence ontology, which has been used to provide information on data quality, expansion of the secondary metabolism node of the pathway ontology to accommodate curation of secondary metabolic pathways, and enhancement of the cellular component ontology for storing and displaying enzyme and pathway locations within subcellular compartments.


Functional & Integrative Genomics | 2002

TAIR: a resource for integrated Arabidopsis data.

Margarita Garcia-Hernandez; Tanya Z. Berardini; Guanghong Chen; Debbie Crist; Aisling Doyle; Eva Huala; Emma M. Knee; Mark Lambrecht; Neil Miller; Lukas A. Mueller; Suparna Mundodi; Leonore Reiser; Seung Y. Rhee; Randy Scholl; Julie Tacklind; Dan C. Weems; Yihe Wu; Iris Xu; Daniel Yoo; Jungwon Yoon; Peifen Zhang

Abstract. The Arabidopsis Information Resource (TAIR; http://arabidopsis.org) provides an integrated view of genomic data for Arabidopsis thaliana. The information is obtained from a battery of sources, including the Arabidopsis user community, the literature, and the major genome centers. Currently TAIR provides information about genes, markers, polymorphisms, maps, sequences, clones, DNA and seed stocks, gene families and proteins. In addition, users can find Arabidopsis publications and information about Arabidopsis researchers. Our emphasis is now on incorporating functional annotations of genes and gene products, genome-wide expression, and biochemical pathway data. Among the tools developed at TAIR, the most notable is the Sequence Viewer, which displays gene annotation, clones, transcripts, markers and polymorphisms on the Arabidopsis genome, and allows zooming in to the nucleotide level. A tool recently released is AraCyc, which is designed for visualization of biochemical pathways. We are also developing tools to extract information from the literature in a systematic way, and building controlled vocabularies to describe biological concepts in collaboration with other database groups. A significant new feature is the integration of the ABRC database functions and stock ordering system, which allows users to place orders for seed and DNA stocks directly from the TAIR site.


Plant Physiology | 2010

Creation of a Genome-Wide Metabolic Pathway Database for Populus trichocarpa Using a New Approach for Reconstruction and Curation of Metabolic Pathways for Plants

Peifen Zhang; Kate Dreher; A. Karthikeyan; Anjo Chi; Anuradha Pujar; Ron Caspi; Peter D. Karp; Vanessa Kirkup; Mario Latendresse; Cynthia Lee; Lukas A. Mueller; Robert J. Muller; Seung Y. Rhee

Metabolic networks reconstructed from sequenced genomes or transcriptomes can help visualize and analyze large-scale experimental data, predict metabolic phenotypes, discover enzymes, engineer metabolic pathways, and study metabolic pathway evolution. We developed a general approach for reconstructing metabolic pathway complements of plant genomes. Two new reference databases were created and added to the core of the infrastructure: a comprehensive, all-plant reference pathway database, PlantCyc, and a reference enzyme sequence database, RESD, for annotating metabolic functions of protein sequences. PlantCyc (version 3.0) includes 714 metabolic pathways and 2,619 reactions from over 300 species. RESD (version 1.0) contains 14,187 literature-supported enzyme sequences from across all kingdoms. We used RESD, PlantCyc, and MetaCyc (an all-species reference metabolic pathway database), in conjunction with the pathway prediction software Pathway Tools, to reconstruct a metabolic pathway database, PoplarCyc, from the recently sequenced genome of Populus trichocarpa. PoplarCyc (version 1.0) contains 321 pathways with 1,807 assigned enzymes. Comparing PoplarCyc (version 1.0) with AraCyc (version 6.0, Arabidopsis [Arabidopsis thaliana]) showed comparable numbers of pathways distributed across all domains of metabolism in both databases, except for a higher number of AraCyc pathways in secondary metabolism and a 1.5-fold increase in carbohydrate metabolic enzymes in PoplarCyc. Here, we introduce these new resources and demonstrate the feasibility of using them to identify candidate enzymes for specific pathways and to analyze metabolite profiling data through concrete examples. These resources can be searched by text or BLAST, browsed, and downloaded from our project Web site (http://plantcyc.org).


Molecular Plant | 2016

iTAK: A Program for Genome-wide Prediction and Classification of Plant Transcription Factors, Transcriptional Regulators, and Protein Kinases.

Yi Zheng; Chen Jiao; Honghe Sun; Hernan G. Rosli; Marina A. Pombo; Peifen Zhang; Michael Banf; Xinbin Dai; Gregory B. Martin; James J. Giovannoni; Patrick Xuechun Zhao; Seung Y. Rhee; Zhangjun Fei

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.


Plant Physiology | 2017

Genome-wide prediction of metabolic enzymes, pathways and gene clusters in plants

Pascal Schläpfer; Peifen Zhang; Chuan Wang; Taehyong Kim; Michael Banf; Lee Chae; Kate Dreher; Arvind K. Chavali; Ricardo Nilo-Poyanco; Thomas Bernard; Daniel Kahn; Seung Y. Rhee

A computational pipeline generates high-quality and genome-scale sets of metabolic enzymes, pathways, and gene clusters from plant genomes. Plant metabolism underpins many traits of ecological and agronomic importance. Plants produce numerous compounds to cope with their environments but the biosynthetic pathways for most of these compounds have not yet been elucidated. To engineer and improve metabolic traits, we need comprehensive and accurate knowledge of the organization and regulation of plant metabolism at the genome scale. Here, we present a computational pipeline to identify metabolic enzymes, pathways, and gene clusters from a sequenced genome. Using this pipeline, we generated metabolic pathway databases for 22 species and identified metabolic gene clusters from 18 species. This unified resource can be used to conduct a wide array of comparative studies of plant metabolism. Using the resource, we discovered a widespread occurrence of metabolic gene clusters in plants: 11,969 clusters from 18 species. The prevalence of metabolic gene clusters offers an intriguing possibility of an untapped source for uncovering new metabolite biosynthesis pathways. For example, more than 1,700 clusters contain enzymes that could generate a specialized metabolite scaffold (signature enzymes) and enzymes that modify the scaffold (tailoring enzymes). In four species with sufficient gene expression data, we identified 43 highly coexpressed clusters that contain signature and tailoring enzymes, of which eight were characterized previously to be functional pathways. Finally, we identified patterns of genome organization that implicate local gene duplication and, to a lesser extent, single gene transposition as having played roles in the evolution of plant metabolic gene clusters.


Archive | 2006

AraCyc: Overview of an Arabidopsis Metabolism Database and its Applications for Plant Research

Seung Y. Rhee; Peifen Zhang; H. Foerster; Christophe Tissier

Currently we are experiencing a rapidly increasing rate of production of large-scale data such as genome sequences, genome-wide gene expression profiles, proteomics and metabolomics data. The necessity to organize all of these data into a biological framework has been, in part, the motivation for the work described in this review. While we have created a comprehensive database that describes the metabolic network of a model plant species, Arabidopsis thaliana, the database is far from being either complete or error-free. Many of the pathways are in need of manual curation using the current literature and many more pathways, particularly those for secondary metabolism and those that include transport reactions, need to be brought into the database. As with any other database project, the content of the AraCyc database is dynamic and will continue to undergo enhancement, additions, and modifications to make it more useful.

Collaboration


Dive into the Peifen Zhang's collaboration.

Top Co-Authors

Avatar

Seung Y. Rhee

Carnegie Institution for Science

View shared research outputs
Top Co-Authors

Avatar

Lukas A. Mueller

Boyce Thompson Institute for Plant Research

View shared research outputs
Top Co-Authors

Avatar

Eva Huala

Carnegie Institution for Science

View shared research outputs
Top Co-Authors

Avatar

Margarita Garcia-Hernandez

Carnegie Institution for Science

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tanya Z. Berardini

Carnegie Institution for Science

View shared research outputs
Top Co-Authors

Avatar

Aisling Doyle

Carnegie Institution for Science

View shared research outputs
Top Co-Authors

Avatar

Christophe Tissier

Carnegie Institution for Science

View shared research outputs
Top Co-Authors

Avatar

Dan C. Weems

National Center for Genome Resources

View shared research outputs
Top Co-Authors

Avatar

Iris Xu

Carnegie Institution for Science

View shared research outputs
Researchain Logo
Decentralizing Knowledge