Archive | 2021

Building cyberinfrastructure systems to support integrative, macroscale analyses of sedimentary ancient DNA records: current resources, needs, and opportunities

 

Abstract


<p>The number and extent of ancient DNA records from sedimentary environments (sedaDNA) is rapidly increasing, which creates new opportunities for integrative and macroscale investigations into past population, community, and environmental dynamics at unprecedented taxonomic resolution and spatiotemporal extent.&#160; However, fully achieving this potential requires a robust cyberinfrastructure that supports the joint analysis of many sedaDNA records with each other and with genomic reference libraries, the latest geochronological controls and age-depth models, complementary paleoecological and paleoenvironmental proxies, and the most recent and updated DNA reference library for taxonomic identifications.&#160; Any cyberinfrastructure for macroscale data synthesis must address the variety of ancient DNA records (e.g. taxonomic groups, analytical approaches, depositional contexts) and leverage existing resources and standards such as the Neotoma Paleoecology Database, the MGnify and MG-RAST resources for environmental genomics, and the MixS standard for genetic sequences.&#160; In response, a Cyberinfrastructure for Ancient Sedimentary DNA working group has been meeting regularly since summer 2020 to assess the current state of science and informatics, assess needs and gaps, and establish recommendations for next steps forward.&#160; An initial survey found over 420&#160; sites worldwide with published or in-development sedaDNA records, with greatest densities in Eurasia.&#160; Metabarcoding records, including Amplicon Sequence Variant data and derived taxonomic inferences, are a top priority for trial uploads to Neotoma, with pilot uploads underway, because of the relatively small dataset volumes, the widespread application of metabarcoding assays, and potential of integrating these records with other paleoecological data holdings in Neotoma and linked paleodata resources such as Linked Earth and paleoclimatic data at NOAA&#8217;s National Centers for Environmental Informatics.&#160; Because taxonomic inferences are heavily conditioned by choice of bioinformatics pipeline and reference databases, a major unmet need is a repository for minimally processed output from raw sequences.&#160; In general, no existing genomics or paleoecological resource meets all needs of the sedaDNA community, although each covers key elements, so there is a good potential of advancing macroscale data syntheses by leveraging and linking existing resources.</p>

Volume None
Pages None
DOI 10.5194/egusphere-egu21-6142
Language English
Journal None

Full Text