Is this you? Create Your Porfile

Yiming Bao

National Institutes of Health

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yiming Bao is active.

Explore More

Publication

Featured researches published by Yiming Bao.

Journal of Virology | 2008

The Influenza Virus Resource at the National Center for Biotechnology Information

Yiming Bao; Pavel Bolotov; Dmitry Dernovoy; Boris Kiryutin; Leonid Zaslavsky; Tatiana Tatusova; Jim Ostell; David J. Lipman

Influenza epidemics cause morbidity and mortality worldwide (4). Each year in the United States, more than 200,000 patients are admitted to hospitals because of influenza and there are approximately 36,000 influenza-related deaths (14). In recent years, several subtypes of avian influenza viruses have jumped host species to infect humans. The H5N1 subtype, in particular, has been reported in 328 human cases and has caused 200 human deaths in 12 countries (World Health Organization, http://www.who.int/csr/disease/avian_influenza/country/cases_table_2007_09_10/en/index.html). These viruses have the potential to cause a pandemic in humans. Antiviral drugs and vaccines must be developed to minimize the damage that such a pandemic would bring. To achieve this, it is vital that researchers have free access to viral sequences in a timely fashion, and sequence analysis tools need to be readily available. Historically, the number of influenza virus sequences in public databases has been far less than those of some well-studied viruses, such as human immunodeficiency virus. The number of complete influenza virus genomes has been even smaller. In addition, many of the sequences were collected in the course of influenza surveillance programs that prioritized antigenically novel isolates. Although collecting antigenically novel isolates is appropriate for surveillance, it results in biased samples of sequenced isolates that are not representative of community cases of influenza (2, 13). Therefore, in 2004, the National Institute of Allergy and Infectious Diseases (NIAID) launched the Influenza Genome Sequencing Project (7), which aims to rapidly sequence influenza viruses from samples collected all over the world. Viral sequences were generated at the J. Craig Venter Institute, annotated at the National Center for Biotechnology Information (NCBI), and deposited in GenBank. In just over 2 years after the initiation of the project, more than 2,000 complete genomes of influenza viruses A and B had been deposited in GenBank. To help the research community to make full use of the wealth of information from such a large amount of data, which will be increasing continuously, the Influenza Virus Resource was created at NCBI in 2004.

Nucleic Acids Research | 2016

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

Nuala A. O'Leary; Mathew W. Wright; J. Rodney Brister; Stacy Ciufo; Diana Haddad; Richard McVeigh; Bhanu Rajput; Barbara Robbertse; Brian Smith-White; Danso Ako-adjei; Alexander Astashyn; Azat Badretdin; Yiming Bao; Olga Blinkova; Vyacheslav Brover; Vyacheslav Chetvernin; Jinna Choi; Eric Cox; Olga Ermolaeva; Catherine M. Farrell; Tamara Goldfarb; Tripti Gupta; Daniel H. Haft; Eneida Hatcher; Wratko Hlavina; Vinita Joardar; Vamsi K. Kodali; Wenjun Li; Donna Maglott; Patrick Masterson

The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.

PLOS Biology | 2005

Whole-Genome Analysis of Human Influenza A Virus Reveals Multiple Persistent Lineages and Reassortment among Recent H3N2 Viruses

Edward C. Holmes; Elodie Ghedin; Naomi Miller; Jill Taylor; Yiming Bao; Kirsten St. George; Bryan T. Grenfell; Claire M. Fraser; David J. Lipman; Jeffery K. Taubenberger

Understanding the evolution of influenza A viruses in humans is important for surveillance and vaccine strain selection. We performed a phylogenetic analysis of 156 complete genomes of human H3N2 influenza A viruses collected between 1999 and 2004 from New York State, United States, and observed multiple co-circulating clades with different population frequencies. Strikingly, phylogenies inferred for individual gene segments revealed that multiple reassortment events had occurred among these clades, such that one clade of H3N2 viruses present at least since 2000 had provided the hemagglutinin gene for all those H3N2 viruses sampled after the 2002–2003 influenza season. This reassortment event was the likely progenitor of the antigenically variant influenza strains that caused the A/Fujian/411/2002-like epidemic of the 2003–2004 influenza season. However, despite sharing the same hemagglutinin, these phylogenetically distinct lineages of viruses continue to co-circulate in the same population. These data, derived from the first large-scale analysis of H3N2 viruses, convincingly demonstrate that multiple lineages can co-circulate, persist, and reassort in epidemiologically significant ways, and underscore the importance of genomic analyses for future influenza surveillance.

Molecular Plant-microbe Interactions | 2004

The Tobacco mosaic virus 126-kDa Protein Associated with Virus Replication and Movement Suppresses RNA Silencing

Xin Shun Ding; Jian-Zhong Liu; Ninghui Cheng; Alexey Folimonov; Yu-Ming Hou; Yiming Bao; Chika Katagi; Shelly A. Carter; Richard S. Nelson

Systemic symptoms induced on Nicotiana tabacum cv. Xanthi by Tobacco mosaic virus (TMV) are modulated by one or both amino-coterminal viral 126- and 183-kDa proteins: proteins involved in virus replication and cell-to-cell movement. Here we compare the systemic accumulation and gene silencing characteristics of TMV strains and mutants that express altered 126- and 183-kDa proteins and induce varying intensities of systemic symptoms on N. tabacum. Through grafting experiments, it was determined that M(IC)1,3, a mutant of the masked strain of TMV that accumulated locally and induced no systemic symptoms, moved through vascular tissue but failed to accumulate to high levels in systemic leaves. The lack of M(IC)1,3 accumulation in systemic leaves was correlated with RNA silencing activity in this tissue through the appearance of virus-specific, approximately 25-nucleotide RNAs and the loss of fluorescence from leaves of transgenic plants expressing the 126-kDa protein fused with green fluorescent protein (GFP). The ability of TMV strains and mutants altered in the 126-kDa protein open reading frame to cause systemic symptoms was positively correlated with their ability to transiently extend expression of the 126-kDa protein:GFP fusion and transiently suppress the silencing of free GFP in transgenic N. tabacum and transgenic N. benthamiana, respectively. Suppression of GFP silencing in N. benthamiana occurred only where virus accumulated to high levels. Using agroinfiltration assays, it was determined that the 126-kDa protein alone could delay GFP silencing. Based on these results and the known synergies between TMV and other viruses, the mechanism of suppression by the 126-kDa protein is compared with those utilized by other originally characterized suppressors of RNA silencing.

Nucleic Acids Research | 2015

NCBI Viral Genomes Resource

J. Rodney Brister; Danso Ako-adjei; Yiming Bao; Olga Blinkova

Recent technological innovations have ignited an explosion in virus genome sequencing that promises to fundamentally alter our understanding of viral biology and profoundly impact public health policy. Yet, any potential benefits from the billowing cloud of next generation sequence data hinge upon well implemented reference resources that facilitate the identification of sequences, aid in the assembly of sequence reads and provide reference annotation sources. The NCBI Viral Genomes Resource is a reference resource designed to bring order to this sequence shockwave and improve usability of viral sequence data. The resource can be accessed at http://www.ncbi.nlm.nih.gov/genome/viruses/ and catalogs all publicly available virus genome sequences and curates reference genome sequences. As the number of genome sequences has grown, so too have the difficulties in annotating and maintaining reference sequences. The rapid expansion of the viral sequence universe has forced a recalibration of the data model to better provide extant sequence representation and enhanced reference sequence products to serve the needs of the various viral communities. This, in turn, has placed increased emphasis on leveraging the knowledge of individual scientific communities to identify important viral sequences and develop well annotated reference virus genome sets.

Archives of Virology | 2013

Virus nomenclature below the species level: a standardized nomenclature for natural variants of viruses assigned to the family Filoviridae

Jens H. Kuhn; Yiming Bao; Sina Bavari; Stephan Becker; Steven B. Bradfute; J. Rodney Brister; Alexander Bukreyev; Kartik Chandran; Robert A. Davey; Olga Dolnik; John M. Dye; Sven Enterlein; Lisa E. Hensley; Anna N. Honko; Peter B. Jahrling; Karl M. Johnson; Gary P. Kobinger; Eric Leroy; Mark S. Lever; Elke Mühlberger; Sergey V. Netesov; Gene G. Olinger; Gustavo Palacios; Jean L. Patterson; Janusz T. Paweska; Louise Pitt; Sheli R. Radoshitzky; Erica Ollmann Saphire; Sophie J. Smither; Robert Swanepoel

The task of international expert groups is to recommend the classification and naming of viruses. The International Committee on Taxonomy of Viruses Filoviridae Study Group and other experts have recently established an almost consistent classification and nomenclature for filoviruses. Here, further guidelines are suggested to include their natural genetic variants. First, this term is defined. Second, a template for full-length virus names (such as “Ebola virus H.sapiens-tc/COD/1995/Kikwit-9510621”) is proposed. These names contain information on the identity of the virus (e.g., Ebola virus), isolation host (e.g., members of the species Homo sapiens), sampling location (e.g., Democratic Republic of the Congo (COD)), sampling year, genetic variant (e.g., Kikwit), and isolate (e.g., 9510621). Suffixes are proposed for individual names that clarify whether a given genetic variant has been characterized based on passage zero material (-wt), has been passaged in tissue/cell culture (-tc), is known from consensus sequence fragments only (-frag), or does (most likely) not exist anymore (-hist). We suggest that these comprehensive names are to be used specifically in the methods section of publications. Suitable abbreviations, also proposed here, could then be used throughout the text, while the full names could be used again in phylograms, tables, or figures if the contained information aids the interpretation of presented data. The proposed system is very similar to the well-known influenzavirus nomenclature and the nomenclature recently proposed for rotaviruses. If applied consistently, it would considerably simplify retrieval of sequence data from electronic databases and be a first important step toward a viral genome annotation standard as sought by the National Center for Biotechnology Information (NCBI). Furthermore, adoption of this nomenclature would increase the general understanding of filovirus-related publications and presentations and improve figures such as phylograms, alignments, and diagrams. Most importantly, it would counter the increasing confusion in genetic variant naming due to the identification of ever more sequences through technological breakthroughs in high-throughput sequencing and environmental sampling.

Archives of Virology | 2014

Improvements to pairwise sequence comparison (PASC): a genome-based web tool for virus classification

Yiming Bao; Vyacheslav Chetvernin; Tatiana Tatusova

The number of viral genome sequences in the public databases is increasing dramatically, and these sequences are playing an important role in virus classification. Pairwise sequence comparison is a sequence-based virus classification method. A program using this method calculates the pairwise identities of virus sequences within a virus family and displays their distribution, and visual analysis helps to determine demarcations at different taxonomic levels such as strain, species, genus and subfamily. Subsequent comparison of new sequences against existing ones allows viruses from which the new sequences were derived to be classified. Although this method cannot be used as the only criterion for virus classification in some cases, it is a quantitative method and has many advantages over conventional virus classification methods. It has been applied to several virus families, and there is an increasing interest in using this method for other virus families/groups. The Pairwise Sequence Comparison (PASC) classification tool was created at the National Center for Biotechnology Information. The tool’s database stores pairwise identities for complete genomes/segments of 56 virus families/groups. Data in the system are updated every day to reflect changes in virus taxonomy and additions of new virus sequences to the public database. The web interface of the tool (http://www.ncbi.nlm.nih.gov/sutils/pasc/) makes it easy to navigate and perform analyses. Multiple new viral genome sequences can be tested simultaneously with this system to suggest the taxonomic position of virus isolates in a specific family. PASC eliminates potential discrepancies in the results caused by different algorithms and/or different data used by researchers.

Archives of Virology | 2013

Virus nomenclature below the species level: A standardized nomenclature for filovirus strains and variants rescued from cDNA

Jens H. Kuhn; Yiming Bao; Sina Bavari; Stephan Becker; Steven B. Bradfute; Kristina Brauburger; J. Rodney Brister; Alexander Bukreyev; Yíngyún Caì; Kartik Chandran; Robert A. Davey; Olga Dolnik; John M. Dye; Sven Enterlein; Jean-Paul Gonzalez; Pierre Formenty; Alexander N. Freiberg; Lisa E. Hensley; Thomas Hoenen; Anna N. Honko; Georgy M. Ignatyev; Peter B. Jahrling; Karl M. Johnson; Hans-Dieter Klenk; Gary P. Kobinger; Matthew G. Lackemeyer; Eric M. Leroy; Mark S. Lever; Elke Mühlberger; Sergewy V. Netesov

Specific alterations (mutations, deletions, insertions) of virus genomes are crucial for the functional characterization of their regulatory elements and their expression products, as well as a prerequisite for the creation of attenuated viruses that could serve as vaccine candidates. Virus genome tailoring can be performed either by using traditionally cloned genomes as starting materials, followed by site-directed mutagenesis, or by de novo synthesis of modified virus genomes or parts thereof. A systematic nomenclature for such recombinant viruses is necessary to set them apart from wild-type and laboratory-adapted viruses, and to improve communication and collaborations among researchers who may want to use recombinant viruses or create novel viruses based on them. A large group of filovirus experts has recently proposed nomenclatures for natural and laboratory animal-adapted filoviruses that aim to simplify the retrieval of sequence data from electronic databases. Here, this work is extended to include nomenclature for filoviruses obtained in the laboratory via reverse genetics systems. The previously developed template for natural filovirus genetic variant naming, (/)///-, is retained, but we propose to adapt the type of information added to each field for cDNA clone-derived filoviruses. For instance, the full-length designation of an Ebola virus Kikwit variant rescued from a plasmid developed at the US Centers for Disease Control and Prevention could be akin to “Ebola virus H.sapiens-rec/COD/1995/Kikwit-abc1” (with the suffix “rec” identifying the recombinant nature of the virus and “abc1” being a placeholder for any meaningful isolate designator). Such a full-length designation should be used in databases and the methods section of publications. Shortened designations (such as “EBOV H.sap/COD/95/Kik-abc1”) and abbreviations (such as “EBOV/Kik-abc1”) could be used in the remainder of the text, depending on how critical it is to convey information contained in the full-length name. “EBOV” would suffice if only one EBOV strain/variant/isolate is addressed.

Viruses | 2014

Filovirus RefSeq Entries: Evaluation and Selection of Filovirus Type Variants, Type Sequences, and Names

Jens H. Kuhn; Kristian G. Andersen; Yiming Bao; Sina Bavari; Stephan Becker; Richard S. Bennett; Nicholas H. Bergman; Olga Blinkova; Steven B. Bradfute; J. Rodney Brister; Alexander Bukreyev; Kartik Chandran; Alexander A. Chepurnov; Robert A. Davey; Ralf G. Dietzgen; Norman A. Doggett; Olga Dolnik; John M. Dye; Sven Enterlein; Paul W. Fenimore; Pierre Formenty; Alexander N. Freiberg; Robert F. Garry; Nicole L. Garza; Stephen K. Gire; Jean-Paul Gonzalez; Anthony Griffiths; Christian T. Happi; Lisa E. Hensley; Andrew S. Herbert

Sequence determination of complete or coding-complete genomes of viruses is becoming common practice for supporting the work of epidemiologists, ecologists, virologists, and taxonomists. Sequencing duration and costs are rapidly decreasing, sequencing hardware is under modification for use by non-experts, and software is constantly being improved to simplify sequence data management and analysis. Thus, analysis of virus disease outbreaks on the molecular level is now feasible, including characterization of the evolution of individual virus populations in single patients over time. The increasing accumulation of sequencing data creates a management problem for the curators of commonly used sequence databases and an entry retrieval problem for end users. Therefore, utilizing the data to their fullest potential will require setting nomenclature and annotation standards for virus isolates and associated genomic sequences. The National Center for Biotechnology Information’s (NCBI’s) RefSeq is a non-redundant, curated database for reference (or type) nucleotide sequence records that supplies source data to numerous other databases. Building on recently proposed templates for filovirus variant naming [ ()////-], we report consensus decisions from a majority of past and currently active filovirus experts on the eight filovirus type variants and isolates to be represented in RefSeq, their final designations, and their associated sequences.

Archives of Virology | 2013

Virus nomenclature below the species level: a standardized nomenclature for laboratory animal-adapted strains and variants of viruses assigned to the family Filoviridae.

Jens H. Kuhn; Yiming Bao; Sina Bavari; Stephan Becker; Steven B. Bradfute; J. Rodney Brister; Alexander Bukreyev; Yíngyún Caì; Kartik Chandran; Robert A. Davey; Olga Dolnik; John M. Dye; Sven Enterlein; Jean-Paul Gonzalez; Pierre Formenty; Alexander N. Freiberg; Lisa E. Hensley; Anna N. Honko; Georgy M. Ignatyev; Peter B. Jahrling; Karl M. Johnson; Hans-Dieter Klenk; Gary P. Kobinger; Matthew G. Lackemeyer; Eric Leroy; Mark S. Lever; Loreen L. Lofts; Elke Mühlberger; Sergey V. Netesov; Gene G. Olinger

The International Committee on Taxonomy of Viruses (ICTV) organizes the classification of viruses into taxa, but is not responsible for the nomenclature for taxa members. International experts groups, such as the ICTV Study Groups, recommend the classification and naming of viruses and their strains, variants, and isolates. The ICTV Filoviridae Study Group has recently introduced an updated classification and nomenclature for filoviruses. Subsequently, and together with numerous other filovirus experts, a consistent nomenclature for their natural genetic variants and isolates was developed that aims at simplifying the retrieval of sequence data from electronic databases. This is a first important step toward a viral genome annotation standard as sought by the US National Center for Biotechnology Information (NCBI). Here, this work is extended to include filoviruses obtained in the laboratory by artificial selection through passage in laboratory hosts. The previously developed template for natural filovirus genetic variant naming ( ///-) is retained, but it is proposed to adapt the type of information added to each field for laboratory animal-adapted variants. For instance, the full-length designation of an Ebola virus Mayinga variant adapted at the State Research Center for Virology and Biotechnology “Vector” to cause disease in guinea pigs after seven passages would be akin to “Ebola virus VECTOR/C.porcellus-lab/COD/1976/Mayinga-GPA-P7”. As was proposed for the names of natural filovirus variants, we suggest using the full-length designation in databases, as well as in the method section of publications. Shortened designations (such as “EBOV VECTOR/C.por/COD/76/May-GPA-P7”) and abbreviations (such as “EBOV/May-GPA-P7”) could be used in the remainder of the text depending on how critical it is to convey information contained in the full-length name. “EBOV” would suffice if only one EBOV strain/variant/isolate is addressed.

Explore More