Introducing FAIR Samples

John Chodacki, October 2, 2024

Physical samples are foundational to research, and many of these activities begin in settings such as field stations, nature reserves, and marine laboratories. Yet the workflows for collecting, cataloging, managing, and publishing sample information are often disconnected from both the broader research lifecycle and the operational contexts in which the work begins. Even when identifier systems and metadata standards exist, they can be difficult to adopt in real-world field and station environments.

This is the focus of FAIR Samples, an NSF-funded EAGER project (NSF Award Number 2433320) co-led by UC3 and RSpace. The project centers on a pragmatic question: at this inflection point, how do we reduce manual work, increase FAIRness, and create workflows that researchers and field station staff can actually adopt?

Where FAIR Samples comes from

FAIR Samples builds on work at UC3 and with partners focused on connecting early-stage research activity to downstream data and publication workflows.

That includes:

FAIR Island, which explored place-based open science and how field station policies, practices, and outputs can be better connected across the research lifecycle through machine-actionable data management plans
Early work on identifying field sites, including proposals to create citable field site descriptors with persistent identifiers, and the broader set of lessons that informed later community work on organization identifiers
Persistent identifier infrastructure and community practice, including work around organization identifiers (including ROR) and the operational practices that make PIDs usable in real workflows
Machine-actionable planning and workflow identifiers, including the DMPTool ecosystem and efforts to make data management planning information more structured and reusable
Vertical interoperability (NSF Grant Number:2433321), a joint line of work with RSpace focused on practical end-to-end workflows that move information across tools in ways researchers and field station staff can realistically adopt

FAIR Samples applies these threads specifically to the sample lifecycle: identifying what needs to happen before, during, and after field collection so that sample context and identifiers can persist through to analysis, deposit, and reuse.

Our research partners, RSpace, bring a complementary perspective grounded in their work on sample management, electronic lab notebooks, and the integration of persistent identifiers into day-to-day research practice. As an early implementer of IGSN ID workflows, a DataCite service provider, and frequent community contributions to the Research Data Alliance (RDA), RSpace has focused on embedding identifier creation and metadata capture directly into the environments where researchers manage samples and experiments. This emphasis on usability and workflow integration aligns closely with their work on vertical interoperability and helps ensure that FAIR Samples is grounded in approaches that researchers can realistically adopt.

Next Steps

FAIR Samples is a research project that is looking to explore approaches to supporting end-to-end sample workflows in ways that align with how research actually happens. Our work relies on foundational work of the larger research infrastructure community and focuses on embedding persistent identifiers early in the process, accommodating the realities of offline field collection, keeping metadata portable across tools, and ensuring that identifiers and their associated context can move with a sample from initial collection through to publication.

The workflow we are prototyping (end-to-end)

FAIR Samples is testing workflows that stitches together existing open tools:

Pre-register identifiers before fieldwork. Bulk-create IGSN IDs and generate printable QR/labels that can be applied to samples in the field.
Collect sample metadata offline. Use FieldMark (offline-first mobile collection) to capture structured metadata, including scanning the IGSN QR into the record.
Import into an inventory + lab context. Import FieldMark records into RSpace as structured sample templates and sample records so metadata is preserved and usable.
Register and publish IGSNs with metadata. Complete required metadata and publish persistent landing pages for IGSNs when appropriate.
Link samples to experiments. Connect samples to experimental records so “what was used” is captured as part of the research narrative.
Deposit outputs to a repository with identifiers intact. Export bundles of documents/data to a repository (Dataverse as a proof-of-concept), including IGSN links as related materials so the identifier stays connected at the end of the workflow.

Get involved

We’re actively looking for input from communities that manage samples and deposit to domain repositories, especially around common submission workflows (including manual steps), metadata schema crosswalk needs, and the best “handoff points” between tools. Please reach out to us or our partners at RSpace, if you would like to collaborate or discuss!