Skip to main content

(index page)

Our Path to FAIR Station

At UC3, our work focuses on how research activities, outputs, and systems connect across the lifecycle, from planning and data collection through to publication and reuse. We approach these challenges from a research infrastructure and information management perspective, which naturally extends upstream to where research begins, including field stations and other place-based environments. As we embark on the FAIR Station project, we wanted to reflect on some of the many projects and work that got us to this point.

Early foundations

UC3 was founded as CDL’s digital curation program to focus on supporting the full research lifecycle. For more than 15 years, our work has centered on enabling connections from planning and data collection. One of our most formative collaborations was with the DataONE community. That vision was ambitious: to support discovery, access, and reuse of environmental data across a distributed landscape, grounded in the realities of field-based, place-dependent research.

DataONE’s emphasis on lifecycle coordination and distributed data collection reinforced the importance of capturing context at the point of origin. These ideas, grounded in foundational work by UC3 and supported through NSF investment, continue to inform our work. At the same time, UC3’s other collaborations explored various entry points into the research lifecycle. 

As a founding member of the DataCite community, we helped lead the adoption of DOIs for research data, strengthening how outputs are identified and connected. The DMPTool supports planning and structuring, while repository services enable publication and citation. In parallel, early work on field station identifiers, including pilots and community presentations, surfaced the importance of treating place as a first-class entity rather than background context. 

Engagement with communities such as RDA, ESIP and OBFS further highlighted the challenges of connecting data, samples, and locations across distributed systems. This work established a strong foundation, but also exposed a critical gap: research context is not consistently captured or linked across the lifecycle.

Making infrastructure actionable

Over the past 10 years, a series of funded projects and community initiatives focused on strengthening the connective tissue of the research ecosystem. This included:

During this period, UC3 also contributed to building shared identifier infrastructure, including the Research Organization Registry (ROR) (NSF EAGER, 2020) and broader PID strategy efforts. This work built directly on earlier exploration with the OBFS and NAML on field station identifiers, where efforts to identify place-based environments shaped our thinking about organizational identity, disambiguation, and persistent identifiers. This work established the ability to link research entities, while also making clear that identifiers alone are not sufficient without being embedded into real workflows.

During this same time, UC3 also worked closely with Dryad to evolve it into a platform for experimentation: collaborating on affiliation tracking, linking datasets to software and publications, and early integrations with data management planning workflows. These efforts moved infrastructure into practice, but also highlighted a limitation: much of our work remained downstream. The question of how to integrate these capabilities into place-based research environments remained.

FAIR Island: Infrastructure, Policy, and Practice

Over the years, UC3 has seen increasing alignment across our work in data management planning, persistent identifiers, and research outputs, and recognized an opportunity to bring these together in a place-based context. Through a partnership with UCNRS, and in particular with our frequent collaborators at the Gump South Pacific Research Station, we identified the field station reservation process as a strategic point of engagement where these elements could be introduced earlier in the research lifecycle.

UC3 made the case internally to pursue this work, leading to initial UC investment in what became the FAIR Island project. The core idea was to take the data policies and expectations typically expressed in grant applications and DMPs, and embed them directly into field station workflows. By working to integrate the DMP Tool with the UCNRS Reservation Application Management System (RAMS), we began exploring how policy, identifiers, and metadata could be introduced at the point where researchers request access to field sites.

This work marked the beginning of a deeper collaboration across UCOP, bringing together research infrastructure and operational systems that support day-to-day field station activities. With subsequent NSF support, FAIR Island expanded to include additional partners, including Metadata Game Changers and the Tetiaroa Society, allowing us to test more complete, end-to-end workflows across planning, data collection, and downstream integration.

Our work on FAIR Island demonstrated the importance of the reservation process. It represents a moment where researchers are already providing structured information, making it a natural and effective place to introduce data policies, identifiers, and expectations that can carry forward through the rest of the research lifecycle. This began a shift from connecting data after the fact to embedding those connections where research begins.

FAIR Samples and vertical interoperability

Building on this foundation, our recent projects have expanded into data collection workflows and cross-system integration. The FAIR Samples project (NSF EAGER, 2024) focuses on improving how physical samples are identified, described, and connected across workflows. A key focus is integrating sample management systems with the broader research ecosystem. This work builds on the concept of vertical interoperability, developed by our partners at RSpace, which focuses on how information moves across layers of the research process, from planning tools and field data collection to lab systems and repositories.

Rather than introducing new systems, the FAIR Samples approach emphasizes connecting existing infrastructure, including IGSNs for samples, tools like FieldMark for structured field data capture, and platforms like RSpace for managing and linking workflows. Together, these integrations demonstrate how coordinated tools can support end-to-end workflows without requiring entirely new systems. 

FAIR Station: Bringing it all together 

This brings us to FAIR Station. This project is not a new direction, but a continuation of our broader efforts to connect place, samples, and data across the research lifecycle. 

Collaboration with UCNRS and work with its RAMS platform has been central to our efforts. What began as an operational system that was not easily extended has, through sustained UC investment, evolved into a platform that is now much better positioned for integration. That shift creates a new phase of opportunity for UC3 and UCNRS to work together on connecting planning, policy, identifiers, and downstream systems in ways that were not previously feasible.

It also opens the door to thinking beyond UC. As RAMS continues to mature, we began exploring how it can be extended and open sourced, creating a foundation that other field stations and research networks can adopt, adapt, and contribute to. The goal is not just to support a single system, but to help enable a broader platform where the global field station community can see themselves and participate. With funding from the Moore Foundation, we can now bring together this work and these partnerships to explore how field station systems can support more connected, interoperable research workflows at scale.

Looking ahead

Across UC3’s projects, there is a consistent way of working: connecting existing efforts, aligning with community practices, and building on infrastructure already in use rather than creating new systems in isolation. All of UC3’s work has only been possible through sustained support from funders and collaboration with communities and partners. That same model is essential for FAIR Station. We will continue to work with field stations, infrastructure providers, and partners like RSpace to extend systems like RAMS and support open, interoperable workflows.

Introducing FAIR Station

Across many areas of research, field stations and marine laboratories (FSMLs) are where science begins. This is where observations are made, samples are collected, and long-term studies take shape. Yet the systems that support this work are often focused on logistics alone: reserving space, coordinating access, and managing operations.

What happens next, how data are described, connected, and ultimately shared, is typically handled elsewhere, and often inconsistently. At UC3, we see this as a missed opportunity. Over the past decade, we have worked on different parts of this challenge. With the FAIR Station project, we are turning our attention upstream to the platform researchers already use to engage with field stations.

A moment that matters

Reserving time at a field station is one of the few universal touchpoints across place-based research. It is a moment where researchers, administrators, and institutional expectations come together. Today, that moment is largely administrative. It could also be where shared practices around data, metadata, and stewardship begin. 

The FAIR Station project is built on this idea. Rather than introducing entirely new systems, the FAIR Station approach starts with what already works and asks a different question: what becomes possible if this layer of the research lifecycle is opened up and connected?

Our work will build on the Reservation Application Management System (RAMS), developed by the University of California Natural Reserve System (UCNRS), and reimagines it as something more than a single-institution system. The goal is to evolve this foundation into an open, extensible platform that can be adopted, adapted, and integrated across a broader community. 

Opening up the ecosystem

UC3’s work has consistently focused on enabling change at scale by working within existing research workflows. When systems align with how research actually happens, and when they interoperate with the broader ecosystem, they can support lasting and meaningful change. The FAIR Station project continues this approach.

We are working toward an open-source platform that acts as a connective layer between fieldwork and the rest of the research lifecycle. By prioritizing openness, modularity, and well-defined interfaces, the FAIR Station project is exploring how this layer can function as a hub that connects with research data management services, persistent identifier infrastructure, repositories, and related systems.

Exactly how this takes shape will be explored with the community. The opportunity is not only to improve individual workflows, but to enable new kinds of connections:

These are directions, not fixed endpoints. The FAIR Station project is being developed as a space for experimentation, iteration, and collaboration.

Working with the community

From the outset, the FAIR Station project is grounded in partnership. We are engaging field station staff, researchers, and organizations across the open infrastructure landscape to help shape what this effort becomes. Through advisory groups, pilot deployments, and collaborations with complementary services, we aim to ensure that the approach reflects real-world needs and remains adaptable across different contexts. This is especially important as we move from a system that has been successful within the University of California to something that can be used more broadly. Opening up this work is not just a technical step. It is a community process.

Now hiring: help shape what comes next

The FAIR Station project is still taking shape. While there is a strong foundation and a clear direction, many of the most important decisions (how this evolves, how it integrates with other systems, and how it serves different communities) are still ahead. With funding from the Moore Foundation, the FAIR Station team is hiring a Product Manager through Code for Science & Society (CS&S), our fiscal sponsor and a long-standing partner in supporting open, community-driven infrastructure.

This is a role for someone interested not just in building a product, but in helping define what this space can become. Working closely with UC3, UCNRS, CS&S, and partners across the FSML community, this person will help translate ideas into a clear roadmap, guide pilot work, and ensure that what emerges remains practical, open, and responsive to real needs.

If you are interested in helping shape the FAIR Station project, or know someone who might be, we encourage you to learn more about the position.

Why CDL Is Investing in COMET: A Community Centered Path to Richer Metadata

When the California Digital Library (CDL) signed the Barcelona Declaration in April 2025, it marked a deeper institutional commitment to building open and community-led research infrastructure. At the heart of this commitment is a recognition that metadata is not a passive byproduct of scholarship, but an active force that shapes how research is discovered, connected, cited, and reused. To build an ecosystem where metadata reflects the values of openness, equity, and trust, we must ensure that its stewardship is shared, inclusive, and sustainable.

This is why CDL’s University of California Curation Center (UC3) program is investing in COMET (Collaborative Metadata Enrichment Taskforce). COMET is both a vision and a framework for creating a healthier metadata ecosystem, where persistent identifiers are enriched and maintained through transparent, distributed workflows that engage the full research community. The principles below represent the building blocks of the COMET model and the foundation of CDL’s participation therein:

How COMET Emerged and CDL’s Participation

COMET emerged from a shared realization across the scholarly infrastructure community: if we want metadata that is trustworthy, complete, and actionable, we need to design systems that allow more people to contribute to it and more institutions to shape its governance. This vision came into sharper focus during a series of workshops at FORCE2024 held in Los Angeles and the Barcelona Declaration Community Meeting held in Paris, where participants from across disciplines and sectors gathered to discuss new models for collaborative metadata curation. These sessions surfaced a common theme: metadata enrichment can’t be sustained by individual repositories or publishers alone. What’s needed is a coordinated, community-powered model that invites researchers, libraries, funders, and infrastructure providers to play an active role in improving the quality of metadata tied to persistent identifiers.

Out of these conversations, COMET was born. By early 2025, COMET had evolved into a formal FORCE11 Project and culminated in an open “Community Call to Action” that invited broad participation in shaping workflows, tools, and governance models for metadata enrichment.

CDL was an early and enthusiastic supporter because the vision aligned with our mission and we see an opportunity to help bring it to life. Our involvement isn’t passive. CDL’s UC3 program brings more than two decades of experience in digital curation, persistent identifier infrastructure, and open scholarly systems. We contribute governance know-how, technical insight from our work on initiatives like EZID, Crossref, ROR, and DataCite, and convening power across academic and infrastructure communities. We also see COMET as a proving ground: a space to pilot scalable, community-led metadata workflows that can extend across institutions, repositories, and disciplines.

For CDL, joining COMET is a continuation of our long-standing commitment to open, shared infrastructure and collective progress. It’s an investment in a future where metadata is openly enriched, transparently verified, and valued by the very communities who depend on it.

What Community Participation Means

When libraries and institutions like CDL engage with efforts like COMET, the benefits extend far beyond improved metadata. Our participation brings a deep commitment to equity, transparency, and public stewardship with values that help shape infrastructure for the public good. By contributing expertise in curation, governance, and metadata standards, libraries ensure that research information is more complete, discoverable, and reusable across repositories, researcher profiles, and campus systems.

Shared governance is a central feature of COMET’s approach, and institutional involvement helps ensure that decisions reflect the needs of a global, diverse, distributed community. When we engage in this work, they align their local priorities with broader efforts to create trustworthy, persistent, and openly governed metadata. This alignment reduces redundancy, increases impact, and builds capacity for meaningful contributions across the ecosystem.

But the benefits of this work aren’t just at the institutional level. For researchers and end users, the results are tangible: better discovery, clearer provenance, and richer metadata that supports citation, reuse, and reproducibility. And for funders, repositories, and service providers, this community-driven model offers a scalable alternative to siloed or proprietary solutions that emphasize interoperability, transparency, and accountability.

That’s why we believe that COMET offers more than just a framework for metadata enrichment. It provides an opportunity for us to embody our mission-driven values and help build the connective infrastructure that research depends on. For CDL, supporting COMET is a way to double down on its long-standing commitment to open, community-led infrastructure. It’s about creating shared pathways to trust, equity, and impact where metadata isn’t hidden or locked down, but serves as the connective tissue for discovery and collaboration.

Proposed revisions to the Principles of Open Scholarly Infrastructure (POSI)

Sustainable, community-driven infrastructure is essential for advancing open scholarship. That’s why UC3 not only championed the Principles of Open Scholarly Infrastructure (POSI) through our advocacy and authorship but also actively supported their adoption by key organizations like ROR, Dryad, DataCite, and Make Data Count. POSI has provided an invaluable framework for transparency, accountability, and community alignment.

As we look toward the future, we’re thrilled to see the next evolution of POSI taking shape with the proposed POSI 2.0 revisions. These updates, informed by real-world experiences of adopters, aim to refine the principles to ensure they remain practical and relevant in an ever-changing landscape. This evolution is not just about updating a framework—it’s about strengthening the foundation for a more open and resilient scholarly ecosystem.

Why does this matter to UC3? As active stewards of open infrastructure, we know that collective input is key to shaping effective, inclusive principles. POSI has empowered organizations to perform self-assessments, build trust with their communities, and advocate for long-term sustainability. We’ve seen firsthand how these principles can elevate not just individual organizations but the entire ecosystem.

With POSI 2.0, we’re calling on the scholarly community to contribute to this critical conversation. The proposed revisions are open for public comment, and this is your chance to help ensure that POSI continues to reflect the needs and aspirations of our diverse community.

How to Get Involved

  1. Review the Proposed Revisions: Dive into the draft of POSI 2.0 and explore the updates.
  2. Share Your Feedback: Take the short survey to share your thoughts and insights.
  3. Spread the Word: Encourage your networks to join this important dialogue.

Deadline: March 5, 2025
Learn More & Participate: https://openscholarlyinfrastructure.org/public-comment-v2/

Introducing FAIR Samples

Physical samples are foundational to research, and many of these activities begin in settings such as field stations, nature reserves, and marine laboratories. Yet the workflows for collecting, cataloging, managing, and publishing sample information are often disconnected from both the broader research lifecycle and the operational contexts in which the work begins. Even when identifier systems and metadata standards exist, they can be difficult to adopt in real-world field and station environments.

This is the focus of FAIR Samples, an NSF-funded EAGER project (NSF Award Number 2433320) co-led by UC3 and RSpace. The project centers on a pragmatic question: at this inflection point, how do we reduce manual work, increase FAIRness, and create workflows that researchers and field station staff can actually adopt?

Where FAIR Samples comes from

FAIR Samples builds on work at UC3 and with partners focused on connecting early-stage research activity to downstream data and publication workflows.

That includes:

FAIR Samples applies these threads specifically to the sample lifecycle: identifying what needs to happen before, during, and after field collection so that sample context and identifiers can persist through to analysis, deposit, and reuse.

Our research partners, RSpace, bring a complementary perspective grounded in their work on sample management, electronic lab notebooks, and the integration of persistent identifiers into day-to-day research practice. As an early implementer of IGSN ID workflows, a DataCite service provider, and frequent community contributions to the Research Data Alliance (RDA), RSpace has focused on embedding identifier creation and metadata capture directly into the environments where researchers manage samples and experiments. This emphasis on usability and workflow integration aligns closely with their work on vertical interoperability and helps ensure that FAIR Samples is grounded in approaches that researchers can realistically adopt.

Next Steps

FAIR Samples is a research project that is looking to explore approaches to supporting end-to-end sample workflows in ways that align with how research actually happens. Our work relies on foundational work of the larger research infrastructure community and focuses on embedding persistent identifiers early in the process, accommodating the realities of offline field collection, keeping metadata portable across tools, and ensuring that identifiers and their associated context can move with a sample from initial collection through to publication.

The workflow we are prototyping (end-to-end)

FAIR Samples is testing workflows that stitches together existing open tools:

  1. Pre-register identifiers before fieldwork. Bulk-create IGSN IDs and generate printable QR/labels that can be applied to samples in the field.
  2. Collect sample metadata offline. Use FieldMark (offline-first mobile collection) to capture structured metadata, including scanning the IGSN QR into the record.
  3. Import into an inventory + lab context. Import FieldMark records into RSpace as structured sample templates and sample records so metadata is preserved and usable.
  4. Register and publish IGSNs with metadata. Complete required metadata and publish persistent landing pages for IGSNs when appropriate.
  5. Link samples to experiments. Connect samples to experimental records so “what was used” is captured as part of the research narrative.
  6. Deposit outputs to a repository with identifiers intact. Export bundles of documents/data to a repository (Dataverse as a proof-of-concept), including IGSN links as related materials so the identifier stays connected at the end of the workflow.

Get involved

We’re actively looking for input from communities that manage samples and deposit to domain repositories, especially around common submission workflows (including manual steps), metadata schema crosswalk needs, and the best “handoff points” between tools. Please reach out to us or our partners at RSpace, if you would like to collaborate or discuss!

Understanding the Vision Behind Make Data Count and the Open Global Data Citation Corpus

As the scientific community increasingly embraces open data, the question of how these datasets are being accessed and utilized becomes ever more pressing. Researchers, funders, and policymakers alike are keen to understand the impact and reach of the data they produce, support, and use. This is where the vision of Make Data Count (MDC) and the Data Citation Corpus comes into play.

What is Make Data Count?

Make Data Count is an international initiative aimed at transforming how we measure the impact of open research data. Traditionally, the scholarly community has focused on citations to articles as a metric of impact. However, as research becomes more data-intensive, it’s clear that we need new metrics to capture the influence and reuse of datasets. MDC is committed to developing evidence-based data metrics that go beyond traditional measures, allowing for a more comprehensive understanding of data usage.

MDC’s efforts focus on creating the infrastructure and standards needed to track, collect, and report data usage and citation metrics. This includes not only citations to datasets within scholarly articles but also how data is used across various fields and sectors. The ultimate goal is to provide a holistic view of how open data contributes to scientific progress, policy-making, and beyond.

For more details on the roadmap and future developments of Make Data Count, you can explore the MDC Roadmap

CDL’s Role in Make Data Count

The University of California Curation Center (UC3) at California Digital Library (CDL) has been a key player in the Make Data Count initiative since it’s inception. CDL’s expertise in managing collaborative projects and its commitment to open data practices have been instrumental in the development and implementation of MDC’s goals. Over the years, CDL team members have provided strategic oversight, technical infrastructure. Currently, CDL team members are members on MDC’s advisory committee and works with other key partners such as DataCite and the MDC Director, Iratxe Puebla, on MDC project execution. CDL continues to play a vital role in fostering collaborations with other institutions and organizations to expand the reach and impact of MDC.

A Centralized Resource for Data Citations

The Data Citation Corpus, developed in collaboration with the Chan Zuckerberg Initiative (CZI) and the Wellcome Trust, is a cornerstone of this vision. The Corpus aims to be a vast, open repository of data citations from diverse sources and disciplines, providing a centralized resource for understanding how data is being cited and reused.

This initiative addresses a significant challenge in the current landscape: the fragmented and incomplete nature of data citation information. While data citations are increasingly being created, the existing workflows for collecting and propagating these citations are often leaky, leading to gaps in the persistent identifier (PID) metadata. Furthermore, in some fields, especially within the life sciences, data sharing via repositories that use accession numbers instead of DOIs is common, which further complicates the collection of metadata on data reuse.

The corpus is being developed in iterative stages, with the initial prototype already incorporating data citations from DataCite event data and the CZI Knowledge Graph. This prototype allows for visualizations based on parameters like institution or data repository, providing valuable insights into how datasets are being cited and used across the research ecosystem.

As the project progresses, the goal is to expand the Data Citation Corpus to include additional sources and features, ultimately creating a resource that different stakeholders—researchers, funders, institutions, and policymakers—can use to integrate data usage information into their work. 

Expanding the Corpus and Engaging the Community

To further the goals of expanding and refining the Data Citation Corpus, MDC is hosting a hackathon on September 4, 2024, focused on building curation workflows for the corpus. The hackathon will bring together data scientists, developers, and engineers to work on two key projects: developing user interfaces for the corpus and creating workflows for community-driven curation of data citations.

The hackathon will take place in two locations, with sessions at the Wellcome Trust in London and the California Digital Library in Oakland, California. Participants will collaborate on innovative solutions that will be presented the following day at the MDC Summit.

Stay tuned for a follow-up post where we will share the outcomes of the hackathon and the exciting developments that emerge from this collaborative effort.

Developing a US National PID Strategy

Advancing Research through a Unified National PID Strategy

In a recent project facilitated by the Open Research Funders Group (ORFG) and Research Data Alliance US (RDA US), the focus has been on developing recommendations for a US National Strategy for Persistent Identifiers (PIDs). Co-chaired by myself and Todd Carpenter, the ORFG PID Strategies Working Group worked to outline the benefits, challenges, and future steps for a US national approach to PIDs.

Current Landscape and the Need for a National Strategy

The US has actively participated in various international efforts, such as UNESCO’s Open Science toolkit. We also have several national level guidance documents, such as the Holdren Memo, Nelson Memo, and the National Security Strategy for United States Government-Supported Research and Development. However, the development of a national strategy for PIDs has not yet been done. Recognizing this gap, the ORFG PID Strategies Working Group published a set of recommendations. These recommendations, available on Zenodo, aim to improve the application and interoperability of PIDs across the US research community.

Benefits of Adopting PIDs

The adoption of PIDs brings numerous benefits:

Developing a US National PID Strategy

The process of developing a national PID strategy involves several steps:

  1. Community Engagement: Gathering input from various stakeholders, including government agencies, academic institutions, and researchers.
  2. Technical Implementation: Upgrading legacy systems to modern PID infrastructures, ensuring they meet the desirable characteristics such as persistence, global uniqueness, and interoperability.
  3. Governance and Support: Establishing centralized governance structures to manage PID systems and provide ongoing support for their adoption and use.

Moving Beyond Legacy Systems

Legacy systems often lack the granularity and interoperability needed for modern research management. Transitioning from these systems to more sustainable and accessible PID infrastructures is essential. This involves technological updates, workflow changes, and stakeholder engagement to ensure a smooth transition.

Centralized PID Infrastructure

Supporting centralized PID infrastructures is crucial for a unified approach to research management. Centralized systems provide a single source of truth, addressing the diverse needs of stakeholders and fostering collaboration across the research ecosystem.

Areas for Investment

To support the transition to a national PID strategy, investment is needed in several areas:

Next Steps and Measuring Success

The ORFG Strategies Working Group has submitted a proposal to the National Information Standards Organization (NISO) to develop these recommendations into a national standard. The process will include public consultations, community participation, and rigorous vetting to ensure the standard meets the needs of the research community.

Conclusion

As we move forward with these initiatives, it is essential for organizations to reflect on their current use of PIDs and consider how they can support the national strategy. By working together, we can enhance the effectiveness, transparency, and impact of research across the United States and beyond.

For more information and to get involved, please refer to the resources linked below:

We extend our gratitude to the Open Research Funders Group, Helios Open, SPARC, RDA US, and the Pervasive Technology Institute at Indiana University for their support in this endeavor. We look forward to continued collaboration as we advance towards a comprehensive national PID strategy. If you are interested in getting involved, please review our report on Zenodo and join the community discussion at PID Forum: https://pidforum.org/t/developing-a-us-national-pid-strategy-report

Lessons learned from organizing the first ever virtual csv,conf

This blogpost was collaboratively written by the csv,conf organizing team which includes John Chodacki from CDL. csv,conf is supported by the Sloan Foundation and the Moore Foundation. The original post can be found here: https://csvconf.com/going-online

A brief history

csv,conf is a community conference that brings diverse groups together to discuss data topics, and features stories about data sharing and data analysis from science, journalism, government, and open source. Over the years we have had over a hundred different talks from a huge range of speakers, most of which you can still watch back on our YouTube Channel.

csv,conf,v1 took place in Berlin in 2014 and we were there again for v2 in 2016 before we moved across the Atlantic for v3 and v4 which were held in Portland, Oregon in the United States in 2017 and 2019. For csv,conf,v5, we were looking forward to our first conference at the University of California Center in Washington DC, but unfortunately, like many other in-person events, this was not going to be possible in 2020.

People have asked us about our experience moving from a planned in-person event to one online, in a very short space of time, so we are sharing our story with the hope that it will be helpful to others, as we move into a world where online events and conferences are going to be more prevalent than ever.

The decision to take the conference online was not an easy one. Until quite late on, the question csv,conf organizers kept asking each other was not “how will we run the conference virtually?” but “will we need to cancel?“. As the pandemic intensified, this decision was taken out of our hands and it became quickly clear that cancelling our event in Washington D.C. was not only the responsible thing to do, but the only thing we could do.

Weighing the decision to hold csv,conf,v5 online

Once it was clear that we would not hold an in-person event, we deliberated on whether we would hold an online event, postpone, or cancel.

Moving online – The challenge

One of our main concerns was whether we would be able to encapsulate everything good about csv,conf in a virtual setting – the warmth you feel when you walk into the room, the interesting side conversations, and the feeling of being reunited with old friends, and naturally meeting new ones were things that we didn’t know whether we could pull off. And if we couldn’t, did we want to do this at all?

We were worried about keeping a commitment to speakers who had made a commitment themselves. But at the same time we were worried speakers may not be interested in delivering something virtually, or that it would not have the same appeal. It was important to us that there was value to the speakers, and at the start of this process we were committed to making this happen.

Many of us have experience running events both in person and online, but this was bigger. We had some great advice and drew heavily on the experience of others in similar positions to us. But it still felt like this was different. We were starting from scratch and for all of our preparation, right up to the moment we pressed ‘go live’ inside Crowdcast, we simply didn’t know whether it was going to work.

But what we found was that hard work, lots of planning and support of the community made it work. There were so many great things about the format that surprised and delighted us. We now find ourselves asking whether an online format is in fact a better fit for our community, and exploring what a hybrid conference might look like in the future.

Moving online – The opportunity

There were a great many reasons to embrace a virtual conference. Once we made the decision and started to plan, this became ever clearer. Not least was the fact that an online conference would give many more people the opportunity to attend. We work hard every year to reduce the barriers to attendance where possible and we’re grateful to our supporters here, but our ability to support conference speakers is limited and it is also probably the biggest cost year-on-year. We are conscious that barriers to entry still apply to a virtual conference, but they are different and it is clear that for csv,conf,v5 more people who wanted to join could be part of it. csv,conf is normally attended by around 250 people. The in-person conferences usually fill up with just a few attendees under capacity. It feels the right size for our community. But this year we had over 1,000 registrations. More new people could attend and there were also more returning faces.


Attendees joined csv,conf,v5’s opening session from around the world

Planning an online conference

Despite the obvious differences, much about organizing a conference remains the same whether virtual or not. Indeed, by the time we by the time we made the shift to an online conference, much of this work had been done.

Organizing team

From about September 2019, the organizing team met up regularly every few weeks on a virtual call. We reviewed our list of things and assigned actions. We used a private channel on Slack for core organizers to keep updated during the week.

We had a good mix of skills and interests on the organizing team from community wranglers to writers and social media aces.

We would like to give a shout out to the team of local volunteers we had on board to help with DC-specific things. In the end this knowledge just wasn’t needed for the virtual conf.

We recruited a group of people from the organizing team to act as the program committee. This group would be responsible for running the call for proposals (CFP) and selecting the talks.

We relied on our committed team of organizers for the conference and we found it helpful to have very clear roles/responsibilities to help manage the different aspects of the ‘live’ conference. We had a host who introduced speakers, a Q&A/chat monitor, a technical helper and a Safety Officer/apps/uc3/apache/htdocs of Conduct enforcer at all times. It was also helpful to have “floaters” who were unassigned to a specific task, but could help with urgent needs.

Selecting talks

We were keen on making it easy for people to complete the call for proposals. We set up a Google form and asked just a few simple questions.

All talks were independently reviewed and scored by members of the committee and we had a final meeting to review our scores and come up with a final list. We were true to the scoring system, but there were other things to consider. Some speakers had submitted several talks and we had decided that even if several talks by the same person scored highly, only one could go into the final schedule. We value diversity of speakers, and reached out to diverse communities to advertise the call for proposals and also considered diversity when selecting talks. Also, where talks were scoring equally, we wanted to ensure we we’re giving priority to speakers who were new to the conference.

We asked all speakers to post their slides onto the csv,conf Zenodo repository. This was really nice to have because attendees asked multiple times for links to slides, so we could simply send them to the Zenodo collection.

Though it proved to not be relevant for 2020 virtual event, it’s worth mentioning that the process of granting travel or accommodation support to speakers was entirely separate from the selection criteria. Although we asked people to flag a request for support, this did not factor into the decision making process.

Creating a schedule

Before we could decide on a schedule, we needed to decide on the hours and timezones we would hold the conference. csv,conf is usually a two-day event with three concurrently run sessions, and we eventually decided to have the virtual event remain two days, but have one main talk session with limited concurrent talks.

Since the in-person conference was supposed to occur in Washington, D.C., many of our speakers were people in US timezones so we focused on timezones that would work best for those speakers. We also wanted to ensure that our conference organizers would be awake during the conference. We started at 10am Eastern, which was very early for West Coast (7am) and late afternoon for non-US attendees (3pm UK; 5pm Eastern Europe). We decided on seven hours of programming each day, meaning the conference ended in late afternoon for US attendees and late evening for Europe. Unfortunately, these timezones did not work for everyone (notably the Asia-Pacific region) and we recommend that you pick timezones that work for your speakers and your conference organizers whilst stretching things as far as possible if equal accessibility is important to you. We also found it was important to clearly list the conference times in multiple timezones on our schedule so that it was easier for attendees to know what time the talks were happening.

Tickets and registration

Although most of what makes csv,conf successful is human passion and attention (and time!), we also found that the costs involved in running a virtual conference are minimal. Except for some extra costs for upgrading our communication platforms, and making funds available to support speakers in getting online, running the conference remotely saved us several thousand dollars.

We have always used an honour system for ticket pricing. We ask people pay what they can afford, with some suggested amounts depending on the attendees situation. But we needed to make some subtle changes for the online event, as it was a different proposition. We first made it clear that tickets were free, and refunded those who had already purchased tickets.

Eventbrite is the platform we have always used for registering attendees for the conference, and it does the job. It’s easy to use and straightforward. We kept it running this year for consistency and to ensure we’re keeping our data organized, even though it involved importing the data into another platform.

We were able to make the conference donation based thanks to the support of the Sloan Foundation, Moore Foundation, and individual contributors and donations. Perhaps because the overall registrations also went up, we found that the donations also went up. In future – and with more planning and promotion – it would be feasible to consider a virtual event of the scale of csv,conf funded entirely by contributions from the community it serves.

Code of Conduct

We spent significant time enhancing our Code of Conduct for the virtual conference. We took in feedback from last year’s conference and reviewed other organizations’ Code of Conduct. The main changes were to consider how a Code of Conduct needed to relate to the specifics of something happening online. We also wanted to create more transparency in the enforcement and decision-making processes.

One new aspect was the ability to report incidents via Slack. We designated two event organizers as “Safety Officers”, and they were responsible for responding to any incident reports and were available for direct messaging via Slack (see the Code of Conduct for full details). We also provided a neutral party to receive incident reports if there were any conflicts of interest.

Communication via Slack

We used Slack for communication during the conference, and received positive feedback about this choice. We added everyone that registered to the Slack channel to ensure that everyone would receive important messages.

We had a Slack session bot that would announce the beginning of each session with the link to the session and we received a lot of positive feedback about the session-bot. For people not on Slack, we also had the schedule in a Google spreadsheet and on the website, and everyone that registered with an email received the talk links via email too. For the session bot, we used the Google Calendar for Team Events app on Slack.

Another popular Slack channel that was created for this conference was a dedicated Q&A channel allowing speakers to interact with session attendees, providing more context around their talks, linking to resources, and chatting about possible collaborations. At the end of each talk, one organizer would copy all of the questions and post them into this Q&A channel so that the conversations could continue. We received a lot of positive feedback about this and it was pleasing to see the conversations continue.

We also had a dedicated speakers channel, where speakers could ask questions and offer mutual support and encouragement both before and during the event.

Another important channel was a backchannel for organizers, which we used mainly to coordinate and cheer each other on during the conf. We also used this to ask for technical help behind the scenes to ensure everything ran as smoothly as possible.

After talks, one organizer would use Slack private messaging to collate and send positive feedback for speakers, as articulated by attendees during the session. This was absolutely worth it and we were really pleased to see the effort was appreciated.

Slack is of course free, but its premium service does offer upgrades for charities and we were lucky enough to make use of this. The application process is very easy and takes less that 10 mins so this is worth considering.

We made good use of Twitter throughout the conference and there were active #commallama and #csvconf hashtags going throughout the event. The organizers had joint responsibility for this and this seemed to work. We simply announced the hashtags at the beginning of the day and people picked them up easily. We had a philosophy of ‘over-communicating’ – offering updates as soon as we had them, and candidly. We used it to to share updates, calls-to-action, and to amplify people’s thoughts, questions and feedback

Picking a video conference platform

Zoom concerns

One of the biggest decisions we had to make was picking a video conferencing platform for the conference. We originally considered using Zoom, but were concerned about a few things. The first was reports of rampant “zoombombing”, where trolls join Zoom meetings with the intent to disrupt the meeting. The second concern was that we are a small team of organizers and there would be great overhead in moderating a Zoom room with hundreds of attendees – muting, unmuting, etc. We also worried that a giant Zoom room would feel very impersonal. Many of us now spend what is probably an unnecessary amount of our daily lives on Zoom and we also felt that stepping away from this would help mark the occasion as something special, so we made the decision to move away from Zoom and we looked to options that we’re more of a broadcast tool than meeting tool.

Crowdcast benefits

We saw another virtual conference that used Crowdcast and were impressed with how it felt to participate, so we started to investigate it as a platform before enthusiastically committing to it, with some reservations.

The best parts of Crowdcast to us were the friendly user interface, which includes a speaker video screen, a dedicated chat section with a prompt bar reading “say something nice”, and a separate box for questions. It felt really intuitive and the features were considered, useful and we incorporated most of them.

From the speaker, participant and host side, the experience felt good and appropriate. The consideration on the different user types was clear in the design and appreciated. One great function was that of a green room, which is akin to a speakers’ couch at the backstage of an in-person conference, helping to calm speakers’ nerves, check their audio and visual settings, discuss cues, etc. before stepping out onto the stage.

Another benefit of Crowdcast is that the talks are immediately available for viewing, complete with chat messages for people to revisit after the conference. This was great as it allowed people to catch up in almost real time and so catch up quickly if they missed something on the day and feel part of the conference discussions as the developed. We also released all talk videos on YouTube and tweeted the links to each talk.

Crowdcast challenges

But Crowdcast was not without its limitations. Everything went very well, and the following issues were not deal breakers, but acknowledging them can help future organizers plan and manage expectations.

Top of the list of concerns was our complete inexperience with it and the likely inexperience of our speakers. To ensure that our speakers were comfortable using Crowdcast, we held many practice sessions with speakers before the conference, and also had an attendee AMA before the conference to get attendees acquainted with the platform. These sessions were vital for us to practice all together and this time and effort absolutely paid off! If there is one piece of advice you should take away from reading this guide it is this: practice practice practice, and give others the opportunity and space to practice as well.

One challenge we faced was hosting – only one account has host privileges, but we learned that many people can log into that account at the same time to share host privileges. Hosts can allow other people to share their screen and unmute, and they can also elevate questions from the chat to the questions box. They can also kick people out if they are being disruptive (which didn’t happen for us, but we wanted to be prepared). This felt a bit weird, honestly, and we had to be careful to be aware of the power we had when in the hosts position. Weird, but also incredibly useful and a key control feature which was essential for an event run by a group rather than an individual.

With Crowdcast, you can only share four screens at a time (so that would be two people sharing two screens). Our usual setup was a host, with one speaker sharing their screen at a time. We could add a speaker for the talks that only had a single other speaker but any more that this we would have had problems.

It was easy enough for the host to chop and change who is on screen at any time, and there’s no limit on the total number of speakers in a session. So there is some flexibility, and ultimately, we were OK. But this should be a big consideration if you are running an event with different forms of presentation.

Crowdcast was also not without its technical hiccups and frustrations. Speakers sometimes fell off the call or had mysterious problems sharing their screens. We received multiple comments/questions on the day about the video lagging/buffering. We often had to resort to the ol’ refresh refresh refresh approach which, to be fair, mostly worked. And on the few occasions we were stumped, there’s quite a lot of support available online and directly from Crowdcast. But honestly, there were very few technical issues for a two-day online conference.

Some attendees wanted info on the speakers (ex: name, twitter handle) during the presentation and we agree it would have been a nice touch to have a button or link in Crowdcast. There is the “call to action” feature, but we were using that to link to the code of conduct.

Crowdcast was new to us, and new to many people in the conference community. As well as these practices we found it helpful to set up an FAQ page with content about how to use Crowdcast and what to expect from an online conference in general. Overall, it was a good decision and a platform we would recommend for consideration.

#commallama

Finally, it would not be csv,conf if it had not been for the #commallama. The comma llama first joined us for csv,conf,v3 in Portland and joined us again for csv,conf,v4. The experience of being around a llama is both relaxing and energising at the same time, and a good way to get people mixing.

Taking the llama online was something we had to do and we were very pleased with how it worked. It was amazing to see how much joy people go out of the experience and also interesting to notice how well people naturally adapted to the online environment. People naturally organized into a virtual queue and took turns coming on to the screen to screengrab a selfie. Thanks to our friends at Mtn Peaks Therapy Llamas & Alpacas for being so accommodating and helping us to make this possible.

A big thank you to our community and supporters

As we reflect on the experience this year, one thing is very clear to us: The conference was only possible because of the community to speak, attend and supported us. It was a success because the community showed up, was kind, welcoming and extremely generous with their knowledge, ideas and time. The local people in D.C. who stepped up to offer knowledge and support on the ground in D.C. was a great example of this and we are incredibly grateful or the support, though this turned out not to be needed.

We were lucky to have a community of developers, journalists, scientists and civic activists who intrinsically know how to interact and support one another online, and who adapted to the realities of an online conference well. From the moment speakers attended our practice sessions on the platform and started to support one another, we knew that things we’re going to work out. We knew things would not all run to plan, but we trusted that the community would be understanding and actively support us in solving problems. It’s something we are grateful for.

We were also thankful to Alfred P. Sloan Foundation, Moore Foundation, and our 100+ individual supporters for making the decision to support us financially. It is worth noting that none of this would have been possible without our planned venue, hotel and catering contracts being very understanding in letting us void our contracts without any penalties.

Looking ahead – the future of csv,conf

Many people have been asking us about the future of csv,conf. Firstly it’s clear that the csv,conf,v5 has given us renewed love for the conference and made it abundantly clear to us of the need for a conference like this in the world. It’s also probably the case that the momentum generated by running the conference this year will secure enthusiasm amongst organizers for putting something together next year.

So the questions will be “what should a future csv,conf look like?”. We will certainly be considering our experience of running this years event online. It was such a success that there is an argument for keeping it online going forward, or putting together something of a hybrid. Time will tell.

We hope that this has been useful for others. If you are organizing an event and have suggestions or further questions that could improve this resource, please let us know. Our Slack remains open and is the best place to get in touch with us.

The original version of this blogpost was published on csvconf.com and republished here with kind permission.

csv,conf,v5 moves online

csv,conf is a non-profit community conference run by folks who really love data and sharing knowledge. The first two years, organizers established the event’s scope and community in Berlin, Germany. The third and fourth year, the organizers moved the event to Portland, Oregon.  And, starting this year, we hoped to move the event to Washington, DC and host csv,conf,v4 at the University of California Center in the nation’s capital.  However, with the ongoing pandemic, we have moved the conference online.  

Check out the csv,conf,v5 schedule at https://csvconf.com/speakers/ 

On May 13-14, 2020, the fifth version of csv,conf will be held virtually. Over two days, attendees will have the opportunity to hear about ongoing work, share skills, exchange ideas and kickstart collaborations. You are welcome to attend, but you must register by the end of day on May 12.

Register for csv,conf,v5 at https://csvconfv5.eventbrite.com 

What is csv,conf?

Over the past several years, UC3 has worked with partners at The Carpentries, Open Knowledge International, DataCite, rOpenSci, and Code for Science and Society to organize csv,conf (https://csvconf.com).  For those that aren’t familiar with the concept, csv,conf brings diverse groups together to discuss data topics, and features stories about data sharing and data analysis from science, journalism, government, and open source. 

Although a ubiquitous term, the acronym CSV has varied meanings depending on who you ask. In the data space, CSV often translates to comma-separated values – a machine-readable data format used to store tabular data in plain text. To many, the format represents simplicity, interoperability, compactness, hackability, among other things. 

From when it first launched in July 2014 as a conference for data makers everywhere, csv,conf adopted the comma-separated-values format in its branding metaphorically. Needless to say, as a data conference that brings together people from different disciplines and domains, conversations and anecdotes shared at csv,conf are not limited to the CSV file format. 

Check out past conference sessions on our YouTube channel.

Join us online

Make sure to check out the csv,conf,v5 schedule at https://csvconf.com/speakers/ and register for csv,conf,v5 at https://csvconfv5.eventbrite.com 

The UC3 team is excited to be part of the conference committee and happy to answer any questions you may have. Feel free to reach out to us at uc3@ucop.edu or to the full committee at csv-conf-coord@googlegroups.com.

https://csvconf.com

Farewell and Thank You to Chris Erdmann

UC3 will bid farewell to Chris Erdmann on September 30th. Chris joined UC3 in May 2018 as Library Carpentry Community & Development Director and has spent the past year and a half expanding the Library Carpentry community in many ways. Chris is moving on to a new role at UNC Chapel Hill, but he will continue to be involved in Library Carpentry as a lesson maintainer and Advisory Group member.

We’ve cross-posted a farewell and final reflection that Chris published on The Carpentries blog.


September 30th will be my last day in the role of Library Carpentry Community & Development Director. I have been fortunate to meet so many amazing people working in libraries and the research community during this time. Thank you to the IMLS, the California Digital Library, and The Carpentries for this great opportunity. So many members of the community have helped Library Carpentry grow these past couple of years, not to mention the initial hard work that went into starting Library Carpentry. Together we have moved Library Carpentry to a formal Lesson Program in The Carpentries. We have welcomed new community members and run more workshops and events around the world. We have improved and expanded the curriculum thanks to the efforts of a diverse community of Maintainers and contributors from around the world. The Curriculum Advisory Committee and the Advisory Group continue to provide invaluable guidance on how we can move forward. Libraries have become an important part of The Carpentries Membership (over 60% are members) and thanks to additional support from the IMLS, libraries will continue to be an important part of the continued success of The Carpentries. I think Elaine Westbrooks, the University Librarian at University of North Carolina Chapel Hill (where I will be headed in October), said it best in her post about the importance of libraries in The Carpentries, The Strategic Value of Library Carpentry and The Carpentries to Research Libraries.

I will continue to be a member of the community, as a Maintainer on lessons, as a member of the Advisory Group, and will continue to teach and participate in discussions, so this is not goodbye. Instead, I will close by sharing some of the fun stories I have had with community members this past year and a half:

Tim Dennis reached out to me right when I started and invited me to teach a workshop at UCLA with him, and then weeks later, Tim and Jamie Jamison helped with hosting the Mozilla Global Sprint from the UCLA Library Data Science Lab. I think all of us were on a sugar high during the sprint.

@LibCarpentry #MozSprint @ucla_ssda @UCLA_YRL the cakes have arrived pic.twitter.com/XbFKq8DcEI

— Tim Dennis (@jt14den) May 10, 2018

CarpentryCon was a rush, meeting many members of the community, but I will never forget my reenactment of Run Lola Run through the streets of Dublin with David Kane to get to the CarpentryCon dinner on time or my bus ride through the Irish countryside with Scott Peterson and Daniel Bangert.

Post #CarpentryCon2018, had a great time with @enigmaticocean and @scottcpeterson2 exploring the Irish countryside today! pic.twitter.com/pFOkAQhEbO

— Chris Erdmann (@libcce) June 2, 2018

Thanks to Birgit Schmidt for inviting me to LIBER 2018 to speak about Library Carpentry and The Carpentries. This later led to a Carpentries Instructor Training at LIBER 2019 at UCD Library in Dublin. After LIBER, I was able to join Katrin LeinweberMateusz KuzakKonrad Förstner and others at the TIB Hannover FAIR Data & Software Carpentries-Based Workshop. This workshop was an inspiration in so many ways!

.@konradfoerstner is again leading us in an interactive activity binning ourselves into how familiar we are with creating installation packages, automated tests, continuous integration #TIBFDS pic.twitter.com/YBsKhr3HFm

— Chris Erdmann (@libcce) July 11, 2018

At the August 2019 University of Calgary Instructor Training, I met so many people that would ultimately become community members helping Library Carpentry grow. I handed out Effin Birds mugs as prizes and was finally able to see Lake Louise with Juliane Schneider. Oh Canada, you’re beautiful.

Well worth the wait 😀 pic.twitter.com/MqClGciSTb

— Chris Erdmann (@libcce) September 1, 2018

Australian Research Data Commons (ARDC) inspired a global sprint in November 2018, for us to try out a new format, Top 10 FAIR Data & Software Things. The event brought in new members and allowed us all to develop guides on what FAIR meant according to disciplines and/or topics. It was a fun experience ending the day talking to colleagues in Australia and waking up the next day talking to colleagues in Europe.

Great to meet @matkuzak @KristinaHettne Peter Verhaar and others from Leiden and Utrecht. Good luck finishing the #Top10FAIR Sprint today! pic.twitter.com/xqroy6ucC6

— Natasha Simons (@n_simons) November 30, 2018

MTSU invited us to do a three-day workshop at the start of 2019. Somehow we pulled it off with Juliane Schneider and I recovering from the flu and one of our instructors getting the flu just before the workshop. Anna Oates was able to avoid the flu with her super human immune system and delivered her first of many amazing training sessions. Of course, we had to go and see the Parthenon replica in Nashville.

“Nashville Parthenon”by schnaars is licensed under CC BY-SA 2.0

In February, ARDC hosted a Library Carpentry workshop tour of Australian cities. They continue to be such amazing supporters of The Carpentries along with so many others there. A special thanks to Natasha Simons for showing me Australia along the way.

Hello Brisbane! @n_simons and I are ready to say hello to @Griffith_Uni tomorrow! pic.twitter.com/BBpm0bV85w

— Chris Erdmann (@libcce) February 13, 2019

Electronic Resources & Libraries hosted its first Data Science in Libraries track in March 2019 inspired by The Carpentries efforts. It is great to see them offer the track once more, to have a Carpentries table at the exhibitor showcase, and to hopefully run workshops at the 2020 conference.

NESCLIC members Joshua Dull and @KristyDawnL running a session on jargon busting to give everyone at @ERandL #erl19 a taste of what a Library Carpentry includes https://t.co/GC3JaYX980 #DataScience #libraries pic.twitter.com/BjDU6DhHEW

— Library Carpentry (@LibCarpentry) March 5, 2019

Somewhere in all of this my wife and I had a baby and everyone has been supportive throughout. I remember one conversation though before I went on paternity leave with Yared Abera Ergu about work and family. It was one of those conversations about life that you have with community members on the side that makes working in this community so special.

Thank you all! I hope I have helped you as much as you have helped me. I will see you out there in the community.