Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Agnes P. Chan is active.

Publication


Featured researches published by Agnes P. Chan.


PLOS ONE | 2012

Predicting the Functional Effect of Amino Acid Substitutions and Indels

Yongwook Choi; Gregory E. Sims; Sean V. Murphy; Jason R. Miller; Agnes P. Chan

As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of sequence variations and narrow down the search of casual variants for disease phenotypes. Different classes of sequence variations at the nucleotide level are involved in human diseases, including substitutions, insertions, deletions, frameshifts, and non-sense mutations. Frameshifts and non-sense mutations are likely to cause a negative effect on protein function. Existing prediction tools primarily focus on studying the deleterious effects of single amino acid substitutions through examining amino acid conservation at the position of interest among related sequences, an approach that is not directly applicable to insertions or deletions. Here, we introduce a versatile alignment-based score as a new metric to predict the damaging effects of variations not limited to single amino acid substitutions but also in-frame insertions, deletions, and multiple amino acid substitutions. This alignment-based score measures the change in sequence similarity of a query sequence to a protein sequence homolog before and after the introduction of an amino acid variation to the query sequence. Our results showed that the scoring scheme performs well in separating disease-associated variants (n = 21,662) from common polymorphisms (n = 37,022) for UniProt human protein variations, and also in separating deleterious variants (n = 15,179) from neutral variants (n = 17,891) for UniProt non-human protein variations. In our approach, the area under the receiver operating characteristic curve (AUC) for the human and non-human protein variation datasets is ∼0.85. We also observed that the alignment-based score correlates with the deleteriousness of a sequence variation. In summary, we have developed a new algorithm, PROVEAN (Protein Variation Effect Analyzer), which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions, and in-frame insertions and deletions. The PROVEAN tool is available online at http://provean.jcvi.org.


Bioinformatics | 2015

PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels

Yongwook Choi; Agnes P. Chan

UNLABELLED We present a web server to predict the functional effect of single or multiple amino acid substitutions, insertions and deletions using the prediction tool PROVEAN. The server provides rapid analysis of protein variants from any organisms, and also supports high-throughput analysis for human and mouse variants at both the genomic and protein levels. AVAILABILITY AND IMPLEMENTATION The web server is freely available and open to all users with no login requirements at http://provean.jcvi.org. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


Nature Biotechnology | 2010

Draft genome sequence of the oilseed species Ricinus communis

Agnes P. Chan; Jonathan Crabtree; Qi Zhao; Hernan Lorenzi; Joshua Orvis; Daniela Puiu; Admasu Melake-Berhan; Kristine M Jones; Julia C. Redman; Grace Q. Chen; Edgar B. Cahoon; Melaku Gedil; Mario Stanke; Brian J. Haas; Jennifer R. Wortman; Claire M. Fraser-Liggett; Jacques Ravel; Pablo D. Rabinowicz

Castor bean (Ricinus communis) is an oilseed crop that belongs to the spurge (Euphorbiaceae) family, which comprises ∼6,300 species that include cassava (Manihot esculenta), rubber tree (Hevea brasiliensis) and physic nut (Jatropha curcas). It is primarily of economic interest as a source of castor oil, used for the production of high-quality lubricants because of its high proportion of the unusual fatty acid ricinoleic acid. However, castor bean genomics is also relevant to biosecurity as the seeds contain high levels of ricin, a highly toxic, ribosome-inactivating protein. Here we report the draft genome sequence of castor bean (4.6-fold coverage), the first for a member of the Euphorbiaceae. Whereas most of the key genes involved in oil synthesis and turnover are single copy, the number of members of the ricin gene family is larger than previously thought. Comparative genomics analysis suggests the presence of an ancient hexaploidization event that is conserved across the dicotyledonous lineage.Castor bean (Ricinus communis) is an oil crop that belongs to the spurge (Euphorbiaceae) family. Its seeds are the source of castor oil, used for the production of high-quality lubricants due to its high proportion of the unusual fatty acid ricinoleic acid. Castor bean seeds also produce ricin, a highly toxic ribosome inactivating protein, making castor bean relevant for biosafety. We report here the 4.6X draft genome sequence of castor bean, representing the first reported Euphorbiaceae genome sequence. Our analysis shows that most key castor oil metabolism genes are single-copy while the ricin gene family is larger than previously thought. Comparative genomics analysis suggests the presence of an ancient hexaploidization event that is conserved across the dicotyledonous lineage.


Genome Biology | 2015

A novel method of consensus pan-chromosome assembly and large-scale comparative analysis reveal the highly flexible pan-genome of Acinetobacter baumannii

Agnes P. Chan; Granger Sutton; Jessica DePew; Radha Krishnakumar; Yongwook Choi; Xiao-Zhe Huang; Erin Beck; Derek M. Harkins; Maria Kim; Emil Lesho; Mikeljon P. Nikolich; Derrick E. Fouts

BackgroundInfections by pan-drug resistant Acinetobacter baumannii plague military and civilian healthcare systems. Previous A. baumannii pan-genomic studies used modest sample sizes of low diversity and comparisons to a single reference genome, limiting our understanding of gene order and content. A consensus representation of multiple genomes will provide a better framework for comparison. A large-scale comparative study will identify genomic determinants associated with their diversity and adaptation as a successful pathogen.ResultsWe determine draft-level genomic sequence of 50 diverse military isolates and conduct the largest bacterial pan-genome analysis of 249 genomes. The pan-genome of A. baumannii is open when the input genomes are normalized for diversity with 1867 core proteins and a paralog-collapsed pan-genome size of 11,694 proteins. We developed a novel graph-based algorithm and use it to assemble the first consensus pan-chromosome, identifying both the order and orientation of core genes and flexible genomic regions. Comparative genome analyses demonstrate the existence of novel resistance islands and isolates with increased numbers of resistance island insertions over time, from single insertions in the 1950s to triple insertions in 2011. Gene clusters responsible for carbon utilization, siderophore production, and pilus assembly demonstrate frequent gain or loss among isolates.ConclusionsThe highly variable and dynamic nature of the A. baumannii genome may be the result of its success in rapidly adapting to both abiotic and biotic environments through the gain and loss of gene clusters controlling fitness. Importantly, some archaic adaptation mechanisms appear to have reemerged among recent isolates.


Nucleic Acids Research | 2004

The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes.

Yuandan Lee; Jennifer Tsai; Sirisha Sunkara; Svetlana Karamycheva; Geo Pertea; Razvan Sultana; Valentin Antonescu; Agnes P. Chan; Foo Cheung; John Quackenbush

Although the list of completed genome sequencing projects has expanded rapidly, sequencing and analysis of expressed sequence tags (ESTs) remain a primary tool for discovery of novel genes in many eukaryotes and a key element in genome annotation. The TIGR Gene Indices (http://www.tigr.org/tdb/tgi) are a collection of 77 species-specific databases that use a highly refined protocol to analyze gene and EST sequences in an attempt to identify and characterize expressed transcripts and to present them on the Web in a user-friendly, consistent fashion. A Gene Index database is constructed for each selected organism by first clustering, then assembling EST and annotated cDNA and gene sequences from GenBank. This process produces a set of unique, high-fidelity virtual transcripts, or tentative consensus (TC) sequences. The TC sequences can be used to provide putative genes with functional annotation, to link the transcripts to genetic and physical maps, to provide links to orthologous and paralogous genes, and as a resource for comparative and functional genomic analysis.


Nucleic Acids Research | 2007

The TIGR Plant Transcript Assemblies database

Kevin L. Childs; John P. Hamilton; Wei Zhu; Eugene Ly; Foo Cheung; Hank Wu; Pablo D. Rabinowicz; Christopher D. Town; C. Robin Buell; Agnes P. Chan

The TIGR Plant Transcript Assemblies (TA) database () uses expressed sequences collected from the NCBI GenBank Nucleotide database for the construction of transcript assemblies. The sequences collected include expressed sequence tags (ESTs) and full-length and partial cDNAs, but exclude computationally predicted gene sequences. The TA database includes all plant species for which more than 1000 EST or cDNA sequences are publicly available. The EST and cDNA sequences are first clustered based on an all-versus-all pairwise sequence comparison, followed by the generation of consensus sequences (TAs) from individual clusters. The clustering and assembly procedures use the TGICL tool, Megablast and the CAP3 assembler. The UniProt Reference Clusters (UniRef100) protein database is used as the reference database for the functional annotation of the assemblies. The transcription orientation of each TA is determined based on the orientation of the alignment with the best protein hit. The TA sequences and annotation are available via web interfaces and FTP downloads. Assemblies can be retrieved by a text-based keyword search or a sequence-based BLAST search. The current version of the TA database is Release 2 (July 17, 2006) and includes a total of 215 plant species.


BMC Biology | 2005

Complete reannotation of the Arabidopsis genome: methods, tools, protocols and the final release

Brian J. Haas; Jennifer R. Wortman; Catherine M. Ronning; Linda I. Hannick; R. K. W. Smith; Rama Maiti; Agnes P. Chan; Chunhui Yu; Maryam Farzad; Dongying Wu; Owen White; Christopher D. Town

BackgroundSince the initial publication of its complete genome sequence, Arabidopsis thaliana has become more important than ever as a model for plant research. However, the initial genome annotation was submitted by multiple centers using inconsistent methods, making the data difficult to use for many applications.ResultsOver the course of three years, TIGR has completed its effort to standardize the structural and functional annotation of the Arabidopsis genome. Using both manual and automated methods, Arabidopsis gene structures were refined and gene products were renamed and assigned to Gene Ontology categories. We present an overview of the methods employed, tools developed, and protocols followed, summarizing the contents of each data release with special emphasis on our final annotation release (version 5).ConclusionOver the entire period, several thousand new genes and pseudogenes were added to the annotation. Approximately one third of the originally annotated gene models were significantly refined yielding improved gene structure annotations, and every protein-coding gene was manually inspected and classified using Gene Ontology terms.


Plant Physiology | 2003

Annotation of the Arabidopsis Genome

Jennifer R. Wortman; Brian J. Haas; Linda I. Hannick; R. K. W. Smith; Rama Maiti; Catherine M. Ronning; Agnes P. Chan; Chunhui Yu; Mulu Ayele; Catherine A. Whitelaw; Owen R. White; Christopher D. Town

The Arabidopsis Genome Sequencing Project was officially completed in late 2000, leading to the publication of a landmark paper describing, in broad outline, many salient features of the Arabidopsis genome ([Arabidopsis Genome Initiative [AGI], 2000][1]). However, the genome annotation, generated by


BMC Plant Biology | 2010

Single nucleotide polymorphisms for assessing genetic diversity in castor bean ( Ricinus communis )

Jeffrey T. Foster; Gerard J. Allan; Agnes P. Chan; Pablo D. Rabinowicz; Jacques Ravel; Paul J. Jackson; Paul Keim

BackgroundCastor bean (Ricinus communis) is an agricultural crop and garden ornamental that is widely cultivated and has been introduced worldwide. Understanding population structure and the distribution of castor bean cultivars has been challenging because of limited genetic variability. We analyzed the population genetics of R. communis in a worldwide collection of plants from germplasm and from naturalized populations in Florida, U.S. To assess genetic diversity we conducted survey sequencing of the genomes of seven diverse cultivars and compared the data to a reference genome assembly of a widespread cultivar (Hale). We determined the population genetic structure of 676 samples using single nucleotide polymorphisms (SNPs) at 48 loci.ResultsBayesian clustering indicated five main groups worldwide and a repeated pattern of mixed genotypes in most countries. High levels of population differentiation occurred between most populations but this structure was not geographically based. Most molecular variance occurred within populations (74%) followed by 22% among populations, and 4% among continents. Samples from naturalized populations in Florida indicated significant population structuring consistent with local demes. There was significant population differentiation for 56 of 78 comparisons in Florida (pairwise population ϕPT values, p < 0.01).ConclusionLow levels of genetic diversity and mixing of genotypes have led to minimal geographic structuring of castor bean populations worldwide. Relatively few lineages occur and these are widely distributed. Our approach of determining population genetic structure using SNPs from genome-wide comparisons constitutes a framework for high-throughput analyses of genetic diversity in plants, particularly in species with limited genetic diversity.


Plant Physiology | 2008

Advancing Cell Biology and Functional Genomics in Maize Using Fluorescent Protein-Tagged Lines

Amitabh Mohanty; Anding Luo; Stacy L. DeBlasio; Xingyuan Ling; Yan Yang; Dorothy E. Tuthill; Katherine E. Williams; Daniel R. Hill; Tara Zadrozny; Agnes P. Chan; Anne W. Sylvester; David Jackson

Genomic resources have significantly impacted plant biology research in recent years. Cell biology has been further enabled by an ongoing revolution in visualization technologies. Using fluorescent proteins (FPs), we now have unprecedented views of cellular architecture, and we can study real-time dynamics of cell structure, function, and protein localization. To date, these technologies have been most widely used in Arabidopsis (Arabidopsis thaliana); however, the grasses provide a unique opportunity to study the underlying mechanisms and inter-related controls of cell growth, morphogenesis, and physiology in leading crop models. Here, we present a resource that leverages the emerging maize (Zea mays) genome sequence to develop tools to study protein structure and function in a cellular context. Traditionally, such studies relied on fixed tissue or FP fusions driven by constitutive promoters, which can lead to significant artifacts. The maize genome sequence now provides access to regulatory regions that can be used to drive native expression. We have developed streamlined methods to generate maize FP-tagged lines using these regulatory elements, allowing analysis of tissue-specific expression and localized function. Identification of diverse proteins that function in specific subcellular compartments will provide the tools for understanding basic developmental, biochemical, and physiological processes in maize, with direct application potential for crop improvement.

Collaboration


Dive into the Agnes P. Chan's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yongwook Choi

Pohang University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Foo Cheung

J. Craig Venter Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Geo Pertea

Johns Hopkins University

View shared research outputs
Researchain Logo
Decentralizing Knowledge