Biodiversity Information Science and Standards | 2021

Internet of Samples: Progress report

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Abstract


Material samples form an important portion of the data infrastructure for many disciplines. Here, a material sample is a physical object, representative of some physical thing, on which observations can be made. Material samples may be collected for one project initially, but can also be valuable resources for other studies in other disciplines. Collecting and curating material samples can be a costly process. Integrating institutionally managed sample collections, along with those sitting in individual offices or labs, is necessary to faciliate large-scale evidence-based scientific research. Many have recognized the problems and are working to make data related to material samples FAIR: findable, accessible, interoperable, and reusable. \n The Internet of Samples (i.e., iSamples) is one of these projects. iSamples was funded by the United States National Science Foundation in 2020 with the following aims:\n \n \n \n enable previously impossible connections between diverse and disparate sample-based observations;\n \n \n support existing research programs and facilities that collect and manage diverse sample types;\n \n \n facilitate new interdisciplinary collaborations; and\n \n \n provide an efficient solution for FAIR samples, avoiding duplicate efforts in different domains (Davies et al. 2021) \n \n \n \n enable previously impossible connections between diverse and disparate sample-based observations;\n support existing research programs and facilities that collect and manage diverse sample types;\n facilitate new interdisciplinary collaborations; and\n provide an efficient solution for FAIR samples, avoiding duplicate efforts in different domains (Davies et al. 2021) \n The initial sample collections that will make up the internet of samples include those from the System for Earth Sample Registration (SESAR), Open Context, the Genomic Observatories Meta-Database (GEOME), and Smithsonian Institution Museum of Natural History (NMNH), representing the disciplines of geoscience, archaeology/anthropology, and biology.\n To achieve these aims, the proposed iSamples infrastructure (Fig. 1) has two key components: iSamples in a Box (iSB) and iSamples Central (iSC). The iSC component will be a permanent Internet service that preserves, indexes, and provides access to sample metadata aggregated from iSBs. It will also ensure that persistent identifiers and sample descriptions assigned and used by individual iSBs are synchronized with the records in iSC and with identifier authorities like International Geo Sample Number (IGSN) or Archival Resource Key (ARK). The iSBs create and maintain identifiers and metadata for their respective collection of samples. While providing access to the samples held locally, an iSB also allows iSC to harvest its metadata records. \n The metadata modeling strategy adopted by the iSamples project is a metadata profile-based approach, where core metadata fields that are applicable to all samples, form the core metadata schema for iSamples. Each individual participating collectionis free to include additional metadata in their records, which will also be harvested by iSC and are discoverable through the iSC user interface or APIs (Application Programming Interfaces), just like the core. In-depth analysis of metadata profiles used by participating collections, including Darwin Core, has resulted in an iSamples core schema currently being tested and refined through use. See the current version of the iSamples core schema.\n A number of properties require a controlled vocabulary. Controlled vocabularies used by existing records are kept, while new vocabularies are also being developed to support high-level grouping with consistent semantics across collection types. Examples include vocabularies for Context Category, Material Category, and Specimen Type (Table 1). These vocabularies were also developed in a bottom-up manner, based on the terms used in the existing collections. For each vocabulary, a decision tree graph was created to illustrate relations among the terms, and a card sorting exercise was conducted within the project team to collect feedback. Domain experts are invited to take part in this exercise here, here, and here. These terms will be used as upper-level terms to the existing category terms used in the participating collections and hence create connections among individual participating collections.\n iSample project members are also active in the TDWG Material Sample Task Group and the global consultation on Digital Extended Specimens. Many members of the iSamples project also lead or participate in a sister research coordination network (RCN), Sampling Nature. The goal of this RCN is to develop and refine metadata standards and controlled vocabularies for the iSamples and other projects focusing on material samples. We cordially invite you to participate in the Sampling Nature RCN and help shape the future standards for material samples. Contact Sarah Ramdeen ([email protected]) to engage with the RCN.

Volume None
Pages None
DOI 10.3897/biss.5.75797
Language English
Journal Biodiversity Information Science and Standards

Full Text