Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Françoise Thibaud-Nissen is active.

Publication


Featured researches published by Françoise Thibaud-Nissen.


Nucleic Acids Research | 2007

The TIGR Rice Genome Annotation Resource: improvements and new features

Shu Ouyang; Wei Zhu; John A. Hamilton; Haining Lin; Matthew Campbell; Kevin L. Childs; Françoise Thibaud-Nissen; Renae L. Malek; Yuandan Lee; Li Zheng; Joshua Orvis; Brian J. Haas; Jennifer R. Wortman; C. Robin Buell

In The Institute for Genomic Research Rice Genome Annotation project (), we have continued to update the rice genome sequence with new data and improve the quality of the annotation. In our current release of annotation (Release 4.0; January 12, 2006), we have identified 42 653 non-transposable element-related genes encoding 49 472 gene models as a result of the detection of alternative splicing. We have refined our identification methods for transposable element-related genes resulting in 13 237 genes that are related to transposable elements. Through incorporation of multiple transcript and proteomic expression data sets, we have been able to annotate 24 799 genes (31 739 gene models), representing ∼50% of the total gene models, as expressed in the rice genome. All structural and functional annotation is viewable through our Rice Genome Browser which currently supports 59 tracks. Enhanced data access is available through web interfaces, FTP downloads and a Data Extractor tool developed in order to support discrete dataset downloads.


Nucleic Acids Research | 2014

RefSeq: an update on mammalian reference sequences

Kim D. Pruitt; Garth Brown; Susan M. Hiatt; Françoise Thibaud-Nissen; Alexander Astashyn; Olga Ermolaeva; Catherine M. Farrell; Jennifer Hart; Melissa J. Landrum; Kelly M. McGarvey; Michael R. Murphy; Nuala A. O’Leary; Shashikant Pujar; Bhanu Rajput; Sanjida H. Rangwala; Lillian D. Riddick; Andrei Shkeda; Hanzhen Sun; Pamela Tamez; Raymond E. Tully; Craig Wallin; David Webb; Janet Weber; Wendy Wu; Michael DiCuccio; Paul Kitts; Donna Maglott; Terence Murphy; James Ostell

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration (http://www.ncbi.nlm.nih.gov/refseq/). We report here on growth of the mammalian and human subsets, changes to NCBI’s eukaryotic annotation pipeline and modifications affecting transcript and protein records. Recent changes to NCBI’s eukaryotic genome annotation pipeline provide higher throughput, and the addition of RNAseq data to the pipeline results in a significant expansion of the number of transcripts and novel exons annotated on mammalian RefSeq genomes. Recent annotation changes include reporting supporting evidence for transcript records, modification of exon feature annotation and the addition of a structured report of gene and sequence attributes of biological interest. We also describe a revised protein annotation policy for alternatively spliced transcripts with more divergent predicted proteins and we summarize the current status of the RefSeqGene project.


Nucleic Acids Research | 2016

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

Nuala A. O'Leary; Mathew W. Wright; J. Rodney Brister; Stacy Ciufo; Diana Haddad; Richard McVeigh; Bhanu Rajput; Barbara Robbertse; Brian Smith-White; Danso Ako-adjei; Alexander Astashyn; Azat Badretdin; Yiming Bao; Olga Blinkova; Vyacheslav Brover; Vyacheslav Chetvernin; Jinna Choi; Eric Cox; Olga Ermolaeva; Catherine M. Farrell; Tamara Goldfarb; Tripti Gupta; Daniel H. Haft; Eneida Hatcher; Wratko Hlavina; Vinita Joardar; Vamsi K. Kodali; Wenjun Li; Donna Maglott; Patrick Masterson

The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.


Genome Biology | 2014

Genome of the house fly, Musca domestica L., a global vector of diseases with adaptations to a septic environment

Jeffrey G. Scott; Wesley C. Warren; Leo W. Beukeboom; Daniel Bopp; Andrew G. Clark; Sarah D. Giers; Monika Hediger; Andrew K. Jones; Shinji Kasai; Cheryl A. Leichter; Ming Li; Richard P. Meisel; Patrick Minx; Terence Murphy; David R. Nelson; William R. Reid; Frank D. Rinkevich; Hugh M. Robertson; Timothy B. Sackton; David B. Sattelle; Françoise Thibaud-Nissen; Chad Tomlinson; Louis Jacobus Mgn Van De Zande; Kimberly K. O. Walden; Richard Wilson; Nannan Liu

BackgroundAdult house flies, Musca domestica L., are mechanical vectors of more than 100 devastating diseases that have severe consequences for human and animal health. House fly larvae play a vital role as decomposers of animal wastes, and thus live in intimate association with many animal pathogens.ResultsWe have sequenced and analyzed the genome of the house fly using DNA from female flies. The sequenced genome is 691 Mb. Compared with Drosophila melanogaster, the genome contains a rich resource of shared and novel protein coding genes, a significantly higher amount of repetitive elements, and substantial increases in copy number and diversity of both the recognition and effector components of the immune system, consistent with life in a pathogen-rich environment. There are 146 P450 genes, plus 11 pseudogenes, in M. domestica, representing a significant increase relative to D. melanogaster and suggesting the presence of enhanced detoxification in house flies. Relative to D. melanogaster, M. domestica has also evolved an expanded repertoire of chemoreceptors and odorant binding proteins, many associated with gustation.ConclusionsThis represents the first genome sequence of an insect that lives in intimate association with abundant animal pathogens. The house fly genome provides a rich resource for enabling work on innovative methods of insect control, for understanding the mechanisms of insecticide resistance, genetic adaptation to high pathogen loads, and for exploring the basic biology of this important pest. The genome of this species will also serve as a close out-group to Drosophila in comparative genomic studies.


Plant Physiology | 2009

Evolutionary and Expression Signatures of Pseudogenes in Arabidopsis and Rice

Cheng Zou; Melissa D. Lehti-Shiu; Françoise Thibaud-Nissen; Tanmay Prakash; C. Robin Buell; Shin Han Shiu

Pseudogenes (Ψ) are nonfunctional genomic sequences resembling functional genes. Knowledge of Ψs can improve genome annotation and our understanding of genome evolution. However, there has been relatively little systemic study of Ψs in plants. In this study, we characterized the evolution and expression patterns of Ψs in Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa). In contrast to animal Ψs, many plant Ψs experienced much stronger purifying selection. In addition, plant Ψs experiencing stronger selective constraints tend to be derived from relatively ancient duplicates, suggesting that they were functional for a relatively long time but became Ψs recently. Interestingly, the regions 5′ to the first stops in the Ψs have experienced stronger selective constraints compared with 3′ regions, suggesting that the 5′ regions were functional for a longer period of time after the premature stops appeared. We found that few Ψs have expression evidence, and their expression levels tend to be lower compared with annotated genes. Furthermore, Ψs with expressed sequence tags tend to be derived from relatively recent duplication events, indicating that Ψ expression may be due to insufficient time for complete degeneration of regulatory signals. Finally, larger protein domain families have significantly more Ψs in general. However, while families involved in environmental stress responses have a significant excess of Ψs, transcription factors and receptor-like kinases have lower than expected numbers of Ψs, consistent with their elevated retention rate in plant genomes. Our findings illustrate peculiar properties of plant Ψs, providing additional insight into the evolution of duplicate genes and benefiting future genome annotation.


Plant Molecular Biology | 2006

Expressed Sequence Tags from loblolly pine embryos reveal similarities with angiosperm embryogenesis

John W.G. Cairney; Li Zheng; Allison Cowels; Joseph Hsiao; Victoria Zismann; Jia Liu; Shu Ouyang; Françoise Thibaud-Nissen; John P. Hamilton; Kevin L. Childs; Gerald S. Pullman; Yiting Zhang; Thomas J. Oh; C. Robin Buell

The process of embryogenesis in gymnosperms differs in significant ways from the more widely studied process in angiosperms. To further our understanding of embryogenesis in gymnosperms, we have generated Expressed Sequence Tags (ESTs) from four cDNA libraries constructed from un-normalized, normalized, and subtracted RNA populations of zygotic and somatic embryos of loblolly pine (Pinus taeda L.). A total of 68,721 ESTs were generated from 68,131 cDNA clones. Following clustering and assembly, these sequences collapsed into 5,274 contigs and 6,880 singleton sequences for a total of 12,154 non-redundant sequences. Searches of a non-identical amino acid database revealed a putative homolog for 9,189 sequences, leaving 2,965 sequences with no known function. More extensive searches of additional plant sequence data sets revealed a putative homolog for all but 1,388 (11.4%) of the sequences. Using gene ontologies, a known function could be assigned for 5,495 of the 12,154 total non-redundant sequences with 13,633 associations in total assigned. When compared to ∼72,000 sequences in a collated P. taeda transcript assembly derived from >245,000 ESTs derived from root, xylem, stem, needles, pollen cone, and shoot ESTs, 3,458 (28.5%) of the non-redundant embryo sequences were unique and thereby provide a valuable addition to development of a complete loblolly pine transcriptome. To assess similarities between angiosperm and gymnosperm embryo development, we examined our EST collection for putative homologs of angiosperm genes implicated in embryogenesis. Out of 108 angiosperm embryogenesis-related genes, homologs were present for 83 of these genes suggesting that pine contains similar genes for embryogenesis and that our RNA sampling methods were successful. We also identified sequences from the pine embryo transcriptome that have no known function and may contribute to the programming of gene expression and embryo development.


Nature Communications | 2016

Long-read sequencing and de novo assembly of a Chinese genome

Lingling Shi; Yunfei Guo; Chengliang Dong; John Huddleston; Hui Yang; Xiaolu Han; Aisi Fu; Quan Li; Na Li; Siyi Gong; Katherine E Lintner; Qiong Ding; Zou Wang; Jiang Hu; Depeng Wang; Feng Wang; Lin Wang; Gholson J. Lyon; Yongtao Guan; Yufeng Shen; Oleg V. Evgrafov; James A. Knowles; Françoise Thibaud-Nissen; Valerie Schneider; Chack Yung Yu; Libing Zhou; Evan E. Eichler; Kf So; Kai Wang

Short-read sequencing has enabled the de novo assembly of several individual human genomes, but with inherent limitations in characterizing repeat elements. Here we sequence a Chinese individual HX1 by single-molecule real-time (SMRT) long-read sequencing, construct a physical map by NanoChannel arrays and generate a de novo assembly of 2.93 Gb (contig N50: 8.3 Mb, scaffold N50: 22.0 Mb, including 39.3 Mb N-bases), together with 206 Mb of alternative haplotypes. The assembly fully or partially fills 274 (28.4%) N-gaps in the reference genome GRCh38. Comparison to GRCh38 reveals 12.8 Mb of HX1-specific sequences, including 4.1 Mb that are not present in previously reported Asian genomes. Furthermore, long-read sequencing of the transcriptome reveals novel spliced genes that are not annotated in GENCODE and are missed by short-read RNA-Seq. Our results imply that improved characterization of genome functional variation may require the use of a range of genomic technologies on diverse human populations.


Genome Research | 2017

Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly.

Valerie Schneider; Tina A. Graves-Lindsay; Kerstin Howe; Nathan Bouk; Hsiu-Chuan Chen; Paul Kitts; Terence Murphy; Kim D. Pruitt; Françoise Thibaud-Nissen; Derek Albracht; Robert S. Fulton; Milinn Kremitzki; Vincent Magrini; Chris Markovic; Sean McGrath; Karyn Meltz Steinberg; Kate Auger; William Chow; Joanna Collins; Glenn Harden; Tim Hubbard; Sarah Pelan; Jared T. Simpson; Glen Threadgold; James Torrance; Jonathan Wood; Laura Clarke; Sergey Koren; Matthew Boitano; Paul Peluso

The human reference genome assembly plays a central role in nearly all aspects of todays basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.


G3: Genes, Genomes, Genetics | 2017

A New Chicken Genome Assembly Provides Insight into Avian Genome Structure

Wesley C. Warren; LaDeana W. Hillier; Chad Tomlinson; Patrick Minx; Milinn Kremitzki; Tina Graves; Chris Markovic; Nathan Bouk; Kim D. Pruitt; Françoise Thibaud-Nissen; Valerie Schneider; Tamer Mansour; C. Titus Brown; Aleksey V. Zimin; R. J. Hawken; Mitch Abrahamsen; Alexis B. Pyrkosz; Mireille Morisson; Valerie Fillon; Alain Vignal; William Chow; Kerstin Howe; Janet E. Fulton; Marcia M. Miller; Peter V. Lovell; Claudio V. Mello; Morgan Wirthlin; Andrew S. Mason; Richard Kuo; David W. Burt

The importance of the Gallus gallus (chicken) as a model organism and agricultural animal merits a continuation of sequence assembly improvement efforts. We present a new version of the chicken genome assembly (Gallus_gallus-5.0; GCA_000002315.3), built from combined long single molecule sequencing technology, finished BACs, and improved physical maps. In overall assembled bases, we see a gain of 183 Mb, including 16.4 Mb in placed chromosomes with a corresponding gain in the percentage of intact repeat elements characterized. Of the 1.21 Gb genome, we include three previously missing autosomes, GGA30, 31, and 33, and improve sequence contig length 10-fold over the previous Gallus_gallus-4.0. Despite the significant base representation improvements made, 138 Mb of sequence is not yet located to chromosomes. When annotated for gene content, Gallus_gallus-5.0 shows an increase of 4679 annotated genes (2768 noncoding and 1911 protein-coding) over those in Gallus_gallus-4.0. We also revisited the question of what genes are missing in the avian lineage, as assessed by the highest quality avian genome assembly to date, and found that a large fraction of the original set of missing genes are still absent in sequenced bird species. Finally, our new data support a detailed map of MHC-B, encompassing two segments: one with a highly stable gene copy number and another in which the gene copy number is highly variable. The chicken model has been a critical resource for many other fields of study, and this new reference assembly will substantially further these efforts.


BMC Genomics | 2009

Identification and characterization of pseudogenes in the rice gene complement

Françoise Thibaud-Nissen; Shu Ouyang; C. Robin Buell

BackgroundThe Osa1 Genome Annotation of rice (Oryza sativa L. ssp. japonica cv. Nipponbare) is the product of a semi-automated pipeline that does not explicitly predict pseudogenes. As such, it is likely to mis-annotate pseudogenes as functional genes. A total of 22,033 gene models within the Osa1 Release 5 were investigated as potential pseudogenes as these genes exhibit at least one feature potentially indicative of pseudogenes: lack of transcript support, short coding region, long untranslated region, or, for genes residing within a segmentally duplicated region, lack of a paralog or significantly shorter corresponding paralog.ResultsA total of 1,439 pseudogenes, identified among genes with pseudogene features, were characterized by similarity to fully-supported gene models and the presence of frameshifts or premature translational stop codons. Significant difference in the length of duplicated genes within segmentally-duplicated regions was the optimal indicator of pseudogenization. Among the 816 pseudogenes for which a probable origin could be determined, 75% originated from gene duplication events while 25% were the result of retrotransposition events. A total of 12% of the pseudogenes were expressed. Finally, F-box proteins, BTB/POZ proteins, terpene synthases, chalcone synthases and cytochrome P450 protein families were found to harbor large numbers of pseudogenes.ConclusionThese pseudogenes still have a detectable open reading frame and are thus distinct from pseudogenes detected within intergenic regions which typically lack definable open reading frames. Families containing the highest number of pseudogenes are fast-evolving families involved in ubiquitination and secondary metabolism.

Collaboration


Dive into the Françoise Thibaud-Nissen's collaboration.

Top Co-Authors

Avatar

Terence Murphy

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Paul Kitts

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Michael DiCuccio

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Kim D. Pruitt

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

C. Robin Buell

Michigan State University

View shared research outputs
Top Co-Authors

Avatar

Chad Tomlinson

University of Washington

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Patrick Minx

University of Washington

View shared research outputs
Top Co-Authors

Avatar

Shu Ouyang

J. Craig Venter Institute

View shared research outputs
Top Co-Authors

Avatar

Valerie Schneider

National Institutes of Health

View shared research outputs
Researchain Logo
Decentralizing Knowledge