(index page)
Why CDL Is Investing in COMET: A Community Centered Path to Richer Metadata
When the California Digital Library (CDL) signed the Barcelona Declaration in April 2025, it marked a deeper institutional commitment to building open and community-led research infrastructure. At the heart of this commitment is a recognition that metadata is not a passive byproduct of scholarship, but an active force that shapes how research is discovered, connected, cited, and reused. To build an ecosystem where metadata reflects the values of openness, equity, and trust, we must ensure that its stewardship is shared, inclusive, and sustainable.
This is why CDL’s University of California Curation Center (UC3) program is investing in COMET (Collaborative Metadata Enrichment Taskforce). COMET is both a vision and a framework for creating a healthier metadata ecosystem, where persistent identifiers are enriched and maintained through transparent, distributed workflows that engage the full research community. The principles below represent the building blocks of the COMET model and the foundation of CDL’s participation therein:
- Metadata Must Be Open and Reliable: As with our many efforts to make metadata freely accessible and machine-readable, COMET centers its work on improving completeness, consistency, and interoperability across the persistent identifier (PID) ecosystem. By supporting basic PID metadata elements (e.g., title, author affiliations, publication dates) to be made openly available, CDL aims to dismantle paywalled research environments, ensuring that even the most basic scholarly facts are free, reusable, and trustworthy
- Shared Stewardship Reduces Silos and Gaps: CDL understands the burden metadata creators face: original depositors often lack the resources to enrich records. Yet researchers, funders, and institutions have knowledge that could fill those gaps. COMET’s community-curation model is modeled after the success of ROR, which distributes responsibility for metadata improvements across the ecosystem
CDL’s engagement with COMET reflects its “pathways” approach: working within existing metadata systems while facilitating new, collaborative routes for enrichment and shared stewardship across the ecosystem. - Community Governance Builds Trust and Quality: COMET is not proprietary. It prioritizes inclusive, transparent governance—publishing standards, embracing equitable practices, and grounding changes in real-world use cases. CDL’s track record with community-governed PID systems (Crossref, DataCite, ROR) aligns perfectly with COMET’s ethos.
How COMET Emerged and CDL’s Participation
COMET emerged from a shared realization across the scholarly infrastructure community: if we want metadata that is trustworthy, complete, and actionable, we need to design systems that allow more people to contribute to it and more institutions to shape its governance. This vision came into sharper focus during a series of workshops at FORCE2024 held in Los Angeles and the Barcelona Declaration Community Meeting held in Paris, where participants from across disciplines and sectors gathered to discuss new models for collaborative metadata curation. These sessions surfaced a common theme: metadata enrichment can’t be sustained by individual repositories or publishers alone. What’s needed is a coordinated, community-powered model that invites researchers, libraries, funders, and infrastructure providers to play an active role in improving the quality of metadata tied to persistent identifiers.
Out of these conversations, COMET was born. By early 2025, COMET had evolved into a formal FORCE11 Project and culminated in an open “Community Call to Action” that invited broad participation in shaping workflows, tools, and governance models for metadata enrichment.
CDL was an early and enthusiastic supporter because the vision aligned with our mission and we see an opportunity to help bring it to life. Our involvement isn’t passive. CDL’s UC3 program brings more than two decades of experience in digital curation, persistent identifier infrastructure, and open scholarly systems. We contribute governance know-how, technical insight from our work on initiatives like EZID, Crossref, ROR, and DataCite, and convening power across academic and infrastructure communities. We also see COMET as a proving ground: a space to pilot scalable, community-led metadata workflows that can extend across institutions, repositories, and disciplines.
For CDL, joining COMET is a continuation of our long-standing commitment to open, shared infrastructure and collective progress. It’s an investment in a future where metadata is openly enriched, transparently verified, and valued by the very communities who depend on it.
What Community Participation Means
When libraries and institutions like CDL engage with efforts like COMET, the benefits extend far beyond improved metadata. Our participation brings a deep commitment to equity, transparency, and public stewardship with values that help shape infrastructure for the public good. By contributing expertise in curation, governance, and metadata standards, libraries ensure that research information is more complete, discoverable, and reusable across repositories, researcher profiles, and campus systems.
Shared governance is a central feature of COMET’s approach, and institutional involvement helps ensure that decisions reflect the needs of a global, diverse, distributed community. When we engage in this work, they align their local priorities with broader efforts to create trustworthy, persistent, and openly governed metadata. This alignment reduces redundancy, increases impact, and builds capacity for meaningful contributions across the ecosystem.
But the benefits of this work aren’t just at the institutional level. For researchers and end users, the results are tangible: better discovery, clearer provenance, and richer metadata that supports citation, reuse, and reproducibility. And for funders, repositories, and service providers, this community-driven model offers a scalable alternative to siloed or proprietary solutions that emphasize interoperability, transparency, and accountability.
That’s why we believe that COMET offers more than just a framework for metadata enrichment. It provides an opportunity for us to embody our mission-driven values and help build the connective infrastructure that research depends on. For CDL, supporting COMET is a way to double down on its long-standing commitment to open, community-led infrastructure. It’s about creating shared pathways to trust, equity, and impact where metadata isn’t hidden or locked down, but serves as the connective tissue for discovery and collaboration.
Proposed revisions to the Principles of Open Scholarly Infrastructure (POSI)
Sustainable, community-driven infrastructure is essential for advancing open scholarship. That’s why UC3 not only championed the Principles of Open Scholarly Infrastructure (POSI) through our advocacy and authorship but also actively supported their adoption by key organizations like ROR, Dryad, DataCite, and Make Data Count. POSI has provided an invaluable framework for transparency, accountability, and community alignment.
As we look toward the future, we’re thrilled to see the next evolution of POSI taking shape with the proposed POSI 2.0 revisions. These updates, informed by real-world experiences of adopters, aim to refine the principles to ensure they remain practical and relevant in an ever-changing landscape. This evolution is not just about updating a framework—it’s about strengthening the foundation for a more open and resilient scholarly ecosystem.
Why does this matter to UC3? As active stewards of open infrastructure, we know that collective input is key to shaping effective, inclusive principles. POSI has empowered organizations to perform self-assessments, build trust with their communities, and advocate for long-term sustainability. We’ve seen firsthand how these principles can elevate not just individual organizations but the entire ecosystem.
With POSI 2.0, we’re calling on the scholarly community to contribute to this critical conversation. The proposed revisions are open for public comment, and this is your chance to help ensure that POSI continues to reflect the needs and aspirations of our diverse community.
How to Get Involved
- Review the Proposed Revisions: Dive into the draft of POSI 2.0 and explore the updates.
- Share Your Feedback: Take the short survey to share your thoughts and insights.
- Spread the Word: Encourage your networks to join this important dialogue.
Deadline: March 5, 2025
Learn More & Participate: https://openscholarlyinfrastructure.org/public-comment-v2/
Understanding the Vision Behind Make Data Count and the Open Global Data Citation Corpus
As the scientific community increasingly embraces open data, the question of how these datasets are being accessed and utilized becomes ever more pressing. Researchers, funders, and policymakers alike are keen to understand the impact and reach of the data they produce, support, and use. This is where the vision of Make Data Count (MDC) and the Data Citation Corpus comes into play.
What is Make Data Count?
Make Data Count is an international initiative aimed at transforming how we measure the impact of open research data. Traditionally, the scholarly community has focused on citations to articles as a metric of impact. However, as research becomes more data-intensive, it’s clear that we need new metrics to capture the influence and reuse of datasets. MDC is committed to developing evidence-based data metrics that go beyond traditional measures, allowing for a more comprehensive understanding of data usage.
MDC’s efforts focus on creating the infrastructure and standards needed to track, collect, and report data usage and citation metrics. This includes not only citations to datasets within scholarly articles but also how data is used across various fields and sectors. The ultimate goal is to provide a holistic view of how open data contributes to scientific progress, policy-making, and beyond.
For more details on the roadmap and future developments of Make Data Count, you can explore the MDC Roadmap
CDL’s Role in Make Data Count
The University of California Curation Center (UC3) at California Digital Library (CDL) has been a key player in the Make Data Count initiative since it’s inception. CDL’s expertise in managing collaborative projects and its commitment to open data practices have been instrumental in the development and implementation of MDC’s goals. Over the years, CDL team members have provided strategic oversight, technical infrastructure. Currently, CDL team members are members on MDC’s advisory committee and works with other key partners such as DataCite and the MDC Director, Iratxe Puebla, on MDC project execution. CDL continues to play a vital role in fostering collaborations with other institutions and organizations to expand the reach and impact of MDC.
A Centralized Resource for Data Citations
The Data Citation Corpus, developed in collaboration with the Chan Zuckerberg Initiative (CZI) and the Wellcome Trust, is a cornerstone of this vision. The Corpus aims to be a vast, open repository of data citations from diverse sources and disciplines, providing a centralized resource for understanding how data is being cited and reused.
This initiative addresses a significant challenge in the current landscape: the fragmented and incomplete nature of data citation information. While data citations are increasingly being created, the existing workflows for collecting and propagating these citations are often leaky, leading to gaps in the persistent identifier (PID) metadata. Furthermore, in some fields, especially within the life sciences, data sharing via repositories that use accession numbers instead of DOIs is common, which further complicates the collection of metadata on data reuse.
- Data Citation Corpus is aggregate data citations from a variety of sources, including:Persistent Identifier Authorities: DataCite and Crossref, which collect citations as part of their DOI registration metadata.
- Third-Party Aggregators: Organizations using advanced techniques like full-text mining and machine learning to identify mentions of data in the full text of articles.
The corpus is being developed in iterative stages, with the initial prototype already incorporating data citations from DataCite event data and the CZI Knowledge Graph. This prototype allows for visualizations based on parameters like institution or data repository, providing valuable insights into how datasets are being cited and used across the research ecosystem.
As the project progresses, the goal is to expand the Data Citation Corpus to include additional sources and features, ultimately creating a resource that different stakeholders—researchers, funders, institutions, and policymakers—can use to integrate data usage information into their work.
Expanding the Corpus and Engaging the Community
To further the goals of expanding and refining the Data Citation Corpus, MDC is hosting a hackathon on September 4, 2024, focused on building curation workflows for the corpus. The hackathon will bring together data scientists, developers, and engineers to work on two key projects: developing user interfaces for the corpus and creating workflows for community-driven curation of data citations.
The hackathon will take place in two locations, with sessions at the Wellcome Trust in London and the California Digital Library in Oakland, California. Participants will collaborate on innovative solutions that will be presented the following day at the MDC Summit.
Stay tuned for a follow-up post where we will share the outcomes of the hackathon and the exciting developments that emerge from this collaborative effort.
Developing a US National PID Strategy
Advancing Research through a Unified National PID Strategy
In a recent project facilitated by the Open Research Funders Group (ORFG) and Research Data Alliance US (RDA US), the focus has been on developing recommendations for a US National Strategy for Persistent Identifiers (PIDs). Co-chaired by myself and Todd Carpenter, the ORFG PID Strategies Working Group worked to outline the benefits, challenges, and future steps for a US national approach to PIDs.
Current Landscape and the Need for a National Strategy
The US has actively participated in various international efforts, such as UNESCO’s Open Science toolkit. We also have several national level guidance documents, such as the Holdren Memo, Nelson Memo, and the National Security Strategy for United States Government-Supported Research and Development. However, the development of a national strategy for PIDs has not yet been done. Recognizing this gap, the ORFG PID Strategies Working Group published a set of recommendations. These recommendations, available on Zenodo, aim to improve the application and interoperability of PIDs across the US research community.
Benefits of Adopting PIDs
The adoption of PIDs brings numerous benefits:
- Discovery and Accessibility: PIDs offer reliable approaches to accessing research outputs, including data files and consistent metadata/information.
- Reduction of Administrative Burden: PIDs streamline research activities, reducing the time and cost associated with managing and disseminating research outputs.
- Enhanced Research Assessment: PIDs provide a reliable way to evaluate research impact and outcomes through evidence-based metrics.
- Transparency and Accountability: By fostering trust within the research community, PIDs enhance the transparency of research activities.
- Global Collaboration: PIDs facilitate international research collaborations by ensuring interoperability across different systems and platforms.
Developing a US National PID Strategy
The process of developing a national PID strategy involves several steps:
- Community Engagement: Gathering input from various stakeholders, including government agencies, academic institutions, and researchers.
- Technical Implementation: Upgrading legacy systems to modern PID infrastructures, ensuring they meet the desirable characteristics such as persistence, global uniqueness, and interoperability.
- Governance and Support: Establishing centralized governance structures to manage PID systems and provide ongoing support for their adoption and use.
Moving Beyond Legacy Systems
Legacy systems often lack the granularity and interoperability needed for modern research management. Transitioning from these systems to more sustainable and accessible PID infrastructures is essential. This involves technological updates, workflow changes, and stakeholder engagement to ensure a smooth transition.
Centralized PID Infrastructure
Supporting centralized PID infrastructures is crucial for a unified approach to research management. Centralized systems provide a single source of truth, addressing the diverse needs of stakeholders and fostering collaboration across the research ecosystem.
Areas for Investment
To support the transition to a national PID strategy, investment is needed in several areas:
- Technical Infrastructure: Ensuring scalability, reliability, and adaptability of PID systems.
- Community Engagement: Promoting community participation in governance and decision-making processes.
- Education and Outreach: Raising awareness about the benefits of PIDs and their role in research management.
- Interoperability: Facilitating seamless data exchange between systems.
- Innovation and Research: Supporting new applications and systems to address emerging research needs.
Next Steps and Measuring Success
The ORFG Strategies Working Group has submitted a proposal to the National Information Standards Organization (NISO) to develop these recommendations into a national standard. The process will include public consultations, community participation, and rigorous vetting to ensure the standard meets the needs of the research community.
Conclusion
As we move forward with these initiatives, it is essential for organizations to reflect on their current use of PIDs and consider how they can support the national strategy. By working together, we can enhance the effectiveness, transparency, and impact of research across the United States and beyond.
For more information and to get involved, please refer to the resources linked below:
- Developing a US National PID Strategy: https://doi.org/10.5281/zenodo.10811008
- Desirable Characteristics of Persistent Identifiers: https://doi.org/10.54900/c3hdq-0ev76
- A Roadmap for Developing a US National PID Strategy: https://scholarlykitchen.sspnet.org/2024/03/21/a-roadmap-for-developing-a-us-national-pid-strategy/
We extend our gratitude to the Open Research Funders Group, Helios Open, SPARC, RDA US, and the Pervasive Technology Institute at Indiana University for their support in this endeavor. We look forward to continued collaboration as we advance towards a comprehensive national PID strategy. If you are interested in getting involved, please review our report on Zenodo and join the community discussion at PID Forum: https://pidforum.org/t/developing-a-us-national-pid-strategy-report
Lessons learned from organizing the first ever virtual csv,conf
This blogpost was collaboratively written by the csv,conf organizing team which includes John Chodacki from CDL. csv,conf is supported by the Sloan Foundation and the Moore Foundation. The original post can be found here: https://csvconf.com/going-online
A brief history
csv,conf is a community conference that brings diverse groups together to discuss data topics, and features stories about data sharing and data analysis from science, journalism, government, and open source. Over the years we have had over a hundred different talks from a huge range of speakers, most of which you can still watch back on our YouTube Channel.
csv,conf,v1 took place in Berlin in 2014 and we were there again for v2 in 2016 before we moved across the Atlantic for v3 and v4 which were held in Portland, Oregon in the United States in 2017 and 2019. For csv,conf,v5, we were looking forward to our first conference at the University of California Center in Washington DC, but unfortunately, like many other in-person events, this was not going to be possible in 2020.
People have asked us about our experience moving from a planned in-person event to one online, in a very short space of time, so we are sharing our story with the hope that it will be helpful to others, as we move into a world where online events and conferences are going to be more prevalent than ever.
The decision to take the conference online was not an easy one. Until quite late on, the question csv,conf organizers kept asking each other was not “how will we run the conference virtually?” but “will we need to cancel?“. As the pandemic intensified, this decision was taken out of our hands and it became quickly clear that cancelling our event in Washington D.C. was not only the responsible thing to do, but the only thing we could do.
Weighing the decision to hold csv,conf,v5 online
Once it was clear that we would not hold an in-person event, we deliberated on whether we would hold an online event, postpone, or cancel.
Moving online – The challenge
One of our main concerns was whether we would be able to encapsulate everything good about csv,conf in a virtual setting – the warmth you feel when you walk into the room, the interesting side conversations, and the feeling of being reunited with old friends, and naturally meeting new ones were things that we didn’t know whether we could pull off. And if we couldn’t, did we want to do this at all?
We were worried about keeping a commitment to speakers who had made a commitment themselves. But at the same time we were worried speakers may not be interested in delivering something virtually, or that it would not have the same appeal. It was important to us that there was value to the speakers, and at the start of this process we were committed to making this happen.
Many of us have experience running events both in person and online, but this was bigger. We had some great advice and drew heavily on the experience of others in similar positions to us. But it still felt like this was different. We were starting from scratch and for all of our preparation, right up to the moment we pressed ‘go live’ inside Crowdcast, we simply didn’t know whether it was going to work.
But what we found was that hard work, lots of planning and support of the community made it work. There were so many great things about the format that surprised and delighted us. We now find ourselves asking whether an online format is in fact a better fit for our community, and exploring what a hybrid conference might look like in the future.
Moving online – The opportunity
There were a great many reasons to embrace a virtual conference. Once we made the decision and started to plan, this became ever clearer. Not least was the fact that an online conference would give many more people the opportunity to attend. We work hard every year to reduce the barriers to attendance where possible and we’re grateful to our supporters here, but our ability to support conference speakers is limited and it is also probably the biggest cost year-on-year. We are conscious that barriers to entry still apply to a virtual conference, but they are different and it is clear that for csv,conf,v5 more people who wanted to join could be part of it. csv,conf is normally attended by around 250 people. The in-person conferences usually fill up with just a few attendees under capacity. It feels the right size for our community. But this year we had over 1,000 registrations. More new people could attend and there were also more returning faces.
Attendees joined csv,conf,v5’s opening session from around the world
Planning an online conference
Despite the obvious differences, much about organizing a conference remains the same whether virtual or not. Indeed, by the time we by the time we made the shift to an online conference, much of this work had been done.
Organizing team
From about September 2019, the organizing team met up regularly every few weeks on a virtual call. We reviewed our list of things and assigned actions. We used a private channel on Slack for core organizers to keep updated during the week.
We had a good mix of skills and interests on the organizing team from community wranglers to writers and social media aces.
We would like to give a shout out to the team of local volunteers we had on board to help with DC-specific things. In the end this knowledge just wasn’t needed for the virtual conf.
We recruited a group of people from the organizing team to act as the program committee. This group would be responsible for running the call for proposals (CFP) and selecting the talks.
We relied on our committed team of organizers for the conference and we found it helpful to have very clear roles/responsibilities to help manage the different aspects of the ‘live’ conference. We had a host who introduced speakers, a Q&A/chat monitor, a technical helper and a Safety Officer/apps/uc3/apache/htdocs of Conduct enforcer at all times. It was also helpful to have “floaters” who were unassigned to a specific task, but could help with urgent needs.
Selecting talks
We were keen on making it easy for people to complete the call for proposals. We set up a Google form and asked just a few simple questions.
All talks were independently reviewed and scored by members of the committee and we had a final meeting to review our scores and come up with a final list. We were true to the scoring system, but there were other things to consider. Some speakers had submitted several talks and we had decided that even if several talks by the same person scored highly, only one could go into the final schedule. We value diversity of speakers, and reached out to diverse communities to advertise the call for proposals and also considered diversity when selecting talks. Also, where talks were scoring equally, we wanted to ensure we we’re giving priority to speakers who were new to the conference.
We asked all speakers to post their slides onto the csv,conf Zenodo repository. This was really nice to have because attendees asked multiple times for links to slides, so we could simply send them to the Zenodo collection.
Though it proved to not be relevant for 2020 virtual event, it’s worth mentioning that the process of granting travel or accommodation support to speakers was entirely separate from the selection criteria. Although we asked people to flag a request for support, this did not factor into the decision making process.
Creating a schedule
Before we could decide on a schedule, we needed to decide on the hours and timezones we would hold the conference. csv,conf is usually a two-day event with three concurrently run sessions, and we eventually decided to have the virtual event remain two days, but have one main talk session with limited concurrent talks.
Since the in-person conference was supposed to occur in Washington, D.C., many of our speakers were people in US timezones so we focused on timezones that would work best for those speakers. We also wanted to ensure that our conference organizers would be awake during the conference. We started at 10am Eastern, which was very early for West Coast (7am) and late afternoon for non-US attendees (3pm UK; 5pm Eastern Europe). We decided on seven hours of programming each day, meaning the conference ended in late afternoon for US attendees and late evening for Europe. Unfortunately, these timezones did not work for everyone (notably the Asia-Pacific region) and we recommend that you pick timezones that work for your speakers and your conference organizers whilst stretching things as far as possible if equal accessibility is important to you. We also found it was important to clearly list the conference times in multiple timezones on our schedule so that it was easier for attendees to know what time the talks were happening.
Tickets and registration
Although most of what makes csv,conf successful is human passion and attention (and time!), we also found that the costs involved in running a virtual conference are minimal. Except for some extra costs for upgrading our communication platforms, and making funds available to support speakers in getting online, running the conference remotely saved us several thousand dollars.
We have always used an honour system for ticket pricing. We ask people pay what they can afford, with some suggested amounts depending on the attendees situation. But we needed to make some subtle changes for the online event, as it was a different proposition. We first made it clear that tickets were free, and refunded those who had already purchased tickets.
Eventbrite is the platform we have always used for registering attendees for the conference, and it does the job. It’s easy to use and straightforward. We kept it running this year for consistency and to ensure we’re keeping our data organized, even though it involved importing the data into another platform.
We were able to make the conference donation based thanks to the support of the Sloan Foundation, Moore Foundation, and individual contributors and donations. Perhaps because the overall registrations also went up, we found that the donations also went up. In future – and with more planning and promotion – it would be feasible to consider a virtual event of the scale of csv,conf funded entirely by contributions from the community it serves.
Code of Conduct
We spent significant time enhancing our Code of Conduct for the virtual conference. We took in feedback from last year’s conference and reviewed other organizations’ Code of Conduct. The main changes were to consider how a Code of Conduct needed to relate to the specifics of something happening online. We also wanted to create more transparency in the enforcement and decision-making processes.
One new aspect was the ability to report incidents via Slack. We designated two event organizers as “Safety Officers”, and they were responsible for responding to any incident reports and were available for direct messaging via Slack (see the Code of Conduct for full details). We also provided a neutral party to receive incident reports if there were any conflicts of interest.
Communication via Slack
We used Slack for communication during the conference, and received positive feedback about this choice. We added everyone that registered to the Slack channel to ensure that everyone would receive important messages.
We had a Slack session bot that would announce the beginning of each session with the link to the session and we received a lot of positive feedback about the session-bot. For people not on Slack, we also had the schedule in a Google spreadsheet and on the website, and everyone that registered with an email received the talk links via email too. For the session bot, we used the Google Calendar for Team Events app on Slack.
Another popular Slack channel that was created for this conference was a dedicated Q&A channel allowing speakers to interact with session attendees, providing more context around their talks, linking to resources, and chatting about possible collaborations. At the end of each talk, one organizer would copy all of the questions and post them into this Q&A channel so that the conversations could continue. We received a lot of positive feedback about this and it was pleasing to see the conversations continue.
We also had a dedicated speakers channel, where speakers could ask questions and offer mutual support and encouragement both before and during the event.
Another important channel was a backchannel for organizers, which we used mainly to coordinate and cheer each other on during the conf. We also used this to ask for technical help behind the scenes to ensure everything ran as smoothly as possible.
After talks, one organizer would use Slack private messaging to collate and send positive feedback for speakers, as articulated by attendees during the session. This was absolutely worth it and we were really pleased to see the effort was appreciated.
Slack is of course free, but its premium service does offer upgrades for charities and we were lucky enough to make use of this. The application process is very easy and takes less that 10 mins so this is worth considering.
We made good use of Twitter throughout the conference and there were active #commallama and #csvconf hashtags going throughout the event. The organizers had joint responsibility for this and this seemed to work. We simply announced the hashtags at the beginning of the day and people picked them up easily. We had a philosophy of ‘over-communicating’ – offering updates as soon as we had them, and candidly. We used it to to share updates, calls-to-action, and to amplify people’s thoughts, questions and feedback
Picking a video conference platform
Zoom concerns
One of the biggest decisions we had to make was picking a video conferencing platform for the conference. We originally considered using Zoom, but were concerned about a few things. The first was reports of rampant “zoombombing”, where trolls join Zoom meetings with the intent to disrupt the meeting. The second concern was that we are a small team of organizers and there would be great overhead in moderating a Zoom room with hundreds of attendees – muting, unmuting, etc. We also worried that a giant Zoom room would feel very impersonal. Many of us now spend what is probably an unnecessary amount of our daily lives on Zoom and we also felt that stepping away from this would help mark the occasion as something special, so we made the decision to move away from Zoom and we looked to options that we’re more of a broadcast tool than meeting tool.
Crowdcast benefits
We saw another virtual conference that used Crowdcast and were impressed with how it felt to participate, so we started to investigate it as a platform before enthusiastically committing to it, with some reservations.
The best parts of Crowdcast to us were the friendly user interface, which includes a speaker video screen, a dedicated chat section with a prompt bar reading “say something nice”, and a separate box for questions. It felt really intuitive and the features were considered, useful and we incorporated most of them.
From the speaker, participant and host side, the experience felt good and appropriate. The consideration on the different user types was clear in the design and appreciated. One great function was that of a green room, which is akin to a speakers’ couch at the backstage of an in-person conference, helping to calm speakers’ nerves, check their audio and visual settings, discuss cues, etc. before stepping out onto the stage.
Another benefit of Crowdcast is that the talks are immediately available for viewing, complete with chat messages for people to revisit after the conference. This was great as it allowed people to catch up in almost real time and so catch up quickly if they missed something on the day and feel part of the conference discussions as the developed. We also released all talk videos on YouTube and tweeted the links to each talk.
Crowdcast challenges
But Crowdcast was not without its limitations. Everything went very well, and the following issues were not deal breakers, but acknowledging them can help future organizers plan and manage expectations.
Top of the list of concerns was our complete inexperience with it and the likely inexperience of our speakers. To ensure that our speakers were comfortable using Crowdcast, we held many practice sessions with speakers before the conference, and also had an attendee AMA before the conference to get attendees acquainted with the platform. These sessions were vital for us to practice all together and this time and effort absolutely paid off! If there is one piece of advice you should take away from reading this guide it is this: practice practice practice, and give others the opportunity and space to practice as well.
One challenge we faced was hosting – only one account has host privileges, but we learned that many people can log into that account at the same time to share host privileges. Hosts can allow other people to share their screen and unmute, and they can also elevate questions from the chat to the questions box. They can also kick people out if they are being disruptive (which didn’t happen for us, but we wanted to be prepared). This felt a bit weird, honestly, and we had to be careful to be aware of the power we had when in the hosts position. Weird, but also incredibly useful and a key control feature which was essential for an event run by a group rather than an individual.
With Crowdcast, you can only share four screens at a time (so that would be two people sharing two screens). Our usual setup was a host, with one speaker sharing their screen at a time. We could add a speaker for the talks that only had a single other speaker but any more that this we would have had problems.
It was easy enough for the host to chop and change who is on screen at any time, and there’s no limit on the total number of speakers in a session. So there is some flexibility, and ultimately, we were OK. But this should be a big consideration if you are running an event with different forms of presentation.
Crowdcast was also not without its technical hiccups and frustrations. Speakers sometimes fell off the call or had mysterious problems sharing their screens. We received multiple comments/questions on the day about the video lagging/buffering. We often had to resort to the ol’ refresh refresh refresh approach which, to be fair, mostly worked. And on the few occasions we were stumped, there’s quite a lot of support available online and directly from Crowdcast. But honestly, there were very few technical issues for a two-day online conference.
Some attendees wanted info on the speakers (ex: name, twitter handle) during the presentation and we agree it would have been a nice touch to have a button or link in Crowdcast. There is the “call to action” feature, but we were using that to link to the code of conduct.
Crowdcast was new to us, and new to many people in the conference community. As well as these practices we found it helpful to set up an FAQ page with content about how to use Crowdcast and what to expect from an online conference in general. Overall, it was a good decision and a platform we would recommend for consideration.
#commallama
Finally, it would not be csv,conf if it had not been for the #commallama. The comma llama first joined us for csv,conf,v3 in Portland and joined us again for csv,conf,v4. The experience of being around a llama is both relaxing and energising at the same time, and a good way to get people mixing.
Taking the llama online was something we had to do and we were very pleased with how it worked. It was amazing to see how much joy people go out of the experience and also interesting to notice how well people naturally adapted to the online environment. People naturally organized into a virtual queue and took turns coming on to the screen to screengrab a selfie. Thanks to our friends at Mtn Peaks Therapy Llamas & Alpacas for being so accommodating and helping us to make this possible.
A big thank you to our community and supporters
As we reflect on the experience this year, one thing is very clear to us: The conference was only possible because of the community to speak, attend and supported us. It was a success because the community showed up, was kind, welcoming and extremely generous with their knowledge, ideas and time. The local people in D.C. who stepped up to offer knowledge and support on the ground in D.C. was a great example of this and we are incredibly grateful or the support, though this turned out not to be needed.
We were lucky to have a community of developers, journalists, scientists and civic activists who intrinsically know how to interact and support one another online, and who adapted to the realities of an online conference well. From the moment speakers attended our practice sessions on the platform and started to support one another, we knew that things we’re going to work out. We knew things would not all run to plan, but we trusted that the community would be understanding and actively support us in solving problems. It’s something we are grateful for.
We were also thankful to Alfred P. Sloan Foundation, Moore Foundation, and our 100+ individual supporters for making the decision to support us financially. It is worth noting that none of this would have been possible without our planned venue, hotel and catering contracts being very understanding in letting us void our contracts without any penalties.
Looking ahead – the future of csv,conf
Many people have been asking us about the future of csv,conf. Firstly it’s clear that the csv,conf,v5 has given us renewed love for the conference and made it abundantly clear to us of the need for a conference like this in the world. It’s also probably the case that the momentum generated by running the conference this year will secure enthusiasm amongst organizers for putting something together next year.
So the questions will be “what should a future csv,conf look like?”. We will certainly be considering our experience of running this years event online. It was such a success that there is an argument for keeping it online going forward, or putting together something of a hybrid. Time will tell.
We hope that this has been useful for others. If you are organizing an event and have suggestions or further questions that could improve this resource, please let us know. Our Slack remains open and is the best place to get in touch with us.
The original version of this blogpost was published on csvconf.com and republished here with kind permission.
csv,conf,v5 moves online
csv,conf is a non-profit community conference run by folks who really love data and sharing knowledge. The first two years, organizers established the event’s scope and community in Berlin, Germany. The third and fourth year, the organizers moved the event to Portland, Oregon. And, starting this year, we hoped to move the event to Washington, DC and host csv,conf,v4 at the University of California Center in the nation’s capital. However, with the ongoing pandemic, we have moved the conference online.
Check out the csv,conf,v5 schedule at https://csvconf.com/speakers/
On May 13-14, 2020, the fifth version of csv,conf will be held virtually. Over two days, attendees will have the opportunity to hear about ongoing work, share skills, exchange ideas and kickstart collaborations. You are welcome to attend, but you must register by the end of day on May 12.
Register for csv,conf,v5 at https://csvconfv5.eventbrite.com
What is csv,conf?
Over the past several years, UC3 has worked with partners at The Carpentries, Open Knowledge International, DataCite, rOpenSci, and Code for Science and Society to organize csv,conf (https://csvconf.com). For those that aren’t familiar with the concept, csv,conf brings diverse groups together to discuss data topics, and features stories about data sharing and data analysis from science, journalism, government, and open source.
Although a ubiquitous term, the acronym CSV has varied meanings depending on who you ask. In the data space, CSV often translates to comma-separated values – a machine-readable data format used to store tabular data in plain text. To many, the format represents simplicity, interoperability, compactness, hackability, among other things.
From when it first launched in July 2014 as a conference for data makers everywhere, csv,conf adopted the comma-separated-values format in its branding metaphorically. Needless to say, as a data conference that brings together people from different disciplines and domains, conversations and anecdotes shared at csv,conf are not limited to the CSV file format.
Check out past conference sessions on our YouTube channel.
Join us online
Make sure to check out the csv,conf,v5 schedule at https://csvconf.com/speakers/ and register for csv,conf,v5 at https://csvconfv5.eventbrite.com
The UC3 team is excited to be part of the conference committee and happy to answer any questions you may have. Feel free to reach out to us at uc3@ucop.edu or to the full committee at csv-conf-coord@googlegroups.com.
Farewell and Thank You to Chris Erdmann
UC3 will bid farewell to Chris Erdmann on September 30th. Chris joined UC3 in May 2018 as Library Carpentry Community & Development Director and has spent the past year and a half expanding the Library Carpentry community in many ways. Chris is moving on to a new role at UNC Chapel Hill, but he will continue to be involved in Library Carpentry as a lesson maintainer and Advisory Group member.
We’ve cross-posted a farewell and final reflection that Chris published on The Carpentries blog.
September 30th will be my last day in the role of Library Carpentry Community & Development Director. I have been fortunate to meet so many amazing people working in libraries and the research community during this time. Thank you to the IMLS, the California Digital Library, and The Carpentries for this great opportunity. So many members of the community have helped Library Carpentry grow these past couple of years, not to mention the initial hard work that went into starting Library Carpentry. Together we have moved Library Carpentry to a formal Lesson Program in The Carpentries. We have welcomed new community members and run more workshops and events around the world. We have improved and expanded the curriculum thanks to the efforts of a diverse community of Maintainers and contributors from around the world. The Curriculum Advisory Committee and the Advisory Group continue to provide invaluable guidance on how we can move forward. Libraries have become an important part of The Carpentries Membership (over 60% are members) and thanks to additional support from the IMLS, libraries will continue to be an important part of the continued success of The Carpentries. I think Elaine Westbrooks, the University Librarian at University of North Carolina Chapel Hill (where I will be headed in October), said it best in her post about the importance of libraries in The Carpentries, The Strategic Value of Library Carpentry and The Carpentries to Research Libraries.
I will continue to be a member of the community, as a Maintainer on lessons, as a member of the Advisory Group, and will continue to teach and participate in discussions, so this is not goodbye. Instead, I will close by sharing some of the fun stories I have had with community members this past year and a half:
Tim Dennis reached out to me right when I started and invited me to teach a workshop at UCLA with him, and then weeks later, Tim and Jamie Jamison helped with hosting the Mozilla Global Sprint from the UCLA Library Data Science Lab. I think all of us were on a sugar high during the sprint.
@LibCarpentry #MozSprint @ucla_ssda @UCLA_YRL the cakes have arrived pic.twitter.com/XbFKq8DcEI
— Tim Dennis (@jt14den) May 10, 2018
CarpentryCon was a rush, meeting many members of the community, but I will never forget my reenactment of Run Lola Run through the streets of Dublin with David Kane to get to the CarpentryCon dinner on time or my bus ride through the Irish countryside with Scott Peterson and Daniel Bangert.
Post #CarpentryCon2018, had a great time with @enigmaticocean and @scottcpeterson2 exploring the Irish countryside today! pic.twitter.com/pFOkAQhEbO
— Chris Erdmann (@libcce) June 2, 2018
Thanks to Birgit Schmidt for inviting me to LIBER 2018 to speak about Library Carpentry and The Carpentries. This later led to a Carpentries Instructor Training at LIBER 2019 at UCD Library in Dublin. After LIBER, I was able to join Katrin Leinweber, Mateusz Kuzak, Konrad Förstner and others at the TIB Hannover FAIR Data & Software Carpentries-Based Workshop. This workshop was an inspiration in so many ways!
.@konradfoerstner is again leading us in an interactive activity binning ourselves into how familiar we are with creating installation packages, automated tests, continuous integration #TIBFDS pic.twitter.com/YBsKhr3HFm
— Chris Erdmann (@libcce) July 11, 2018
At the August 2019 University of Calgary Instructor Training, I met so many people that would ultimately become community members helping Library Carpentry grow. I handed out Effin Birds mugs as prizes and was finally able to see Lake Louise with Juliane Schneider. Oh Canada, you’re beautiful.
Well worth the wait 😀 pic.twitter.com/MqClGciSTb
— Chris Erdmann (@libcce) September 1, 2018
Australian Research Data Commons (ARDC) inspired a global sprint in November 2018, for us to try out a new format, Top 10 FAIR Data & Software Things. The event brought in new members and allowed us all to develop guides on what FAIR meant according to disciplines and/or topics. It was a fun experience ending the day talking to colleagues in Australia and waking up the next day talking to colleagues in Europe.
Great to meet @matkuzak @KristinaHettne Peter Verhaar and others from Leiden and Utrecht. Good luck finishing the #Top10FAIR Sprint today! pic.twitter.com/xqroy6ucC6
— Natasha Simons (@n_simons) November 30, 2018
MTSU invited us to do a three-day workshop at the start of 2019. Somehow we pulled it off with Juliane Schneider and I recovering from the flu and one of our instructors getting the flu just before the workshop. Anna Oates was able to avoid the flu with her super human immune system and delivered her first of many amazing training sessions. Of course, we had to go and see the Parthenon replica in Nashville.
“Nashville Parthenon”by schnaars is licensed under CC BY-SA 2.0
In February, ARDC hosted a Library Carpentry workshop tour of Australian cities. They continue to be such amazing supporters of The Carpentries along with so many others there. A special thanks to Natasha Simons for showing me Australia along the way.
Hello Brisbane! @n_simons and I are ready to say hello to @Griffith_Uni tomorrow! pic.twitter.com/BBpm0bV85w
— Chris Erdmann (@libcce) February 13, 2019
Electronic Resources & Libraries hosted its first Data Science in Libraries track in March 2019 inspired by The Carpentries efforts. It is great to see them offer the track once more, to have a Carpentries table at the exhibitor showcase, and to hopefully run workshops at the 2020 conference.
NESCLIC members Joshua Dull and @KristyDawnL running a session on jargon busting to give everyone at @ERandL #erl19 a taste of what a Library Carpentry includes https://t.co/GC3JaYX980 #DataScience #libraries pic.twitter.com/BjDU6DhHEW
— Library Carpentry (@LibCarpentry) March 5, 2019
Somewhere in all of this my wife and I had a baby and everyone has been supportive throughout. I remember one conversation though before I went on paternity leave with Yared Abera Ergu about work and family. It was one of those conversations about life that you have with community members on the side that makes working in this community so special.
Thank you all! I hope I have helped you as much as you have helped me. I will see you out there in the community.
Library Carpentry Receives Supplemental IMLS Grant
To support the ongoing work of Library Carpentry and the data and software training to library- and information-related roles, we are happy to report that IMLS has awarded CDL supplemental funding. This supplemental funding will provide continued support for workshops and instructor training, as well as create a membership scholarship program to reach new library communities and consortiums. The funding will also provide continued support for Library Carpentry’s current goals to expand the pool of Carpentries trainers and instructors from library- and information-related roles and to complete and formalise curriculum and lessons currently being developed by community members. The CDL, The Carpentries, and the Library Carpentry Advisory Group are currently planning outreach to various library networks to see how we can work together towards providing data and software training to their communities. Members of these groups will be reaching out in the coming months. Also, this month (September 2019), The Carpentries will launch a new workshop request form that will respond to library driven and related workshops.
About CDL CDL was founded by the University of California in 1997 to take advantage of emerging technologies that were transforming the way digital information was being published and accessed. Since then, in collaboration with the ten UC campus libraries and other partners, CDL has assembled one of the world’s largest digital research libraries and changed the ways that faculty, students, and researchers discover and access information. We facilitate the licensing of online materials and develop shared services used throughout the UC system. Building on the foundations of the Melvyl Catalog, CDL has developed one of the largest online library catalogs in the United States and works in partnership with the UC campuses to bring the treasures of California’s libraries, museums, and cultural heritage organizations to the world. We continue to explore how services such as digital curation, scholarly publishing, archiving and preservation support research throughout the information lifecycle.
About The Carpentries The Carpentries builds global capacity in essential data and computational skills for conducting efficient, open, and reproducible research. We train and foster an active, inclusive, diverse community of learners and instructors that promotes and models the importance of software and data in research. We collaboratively develop openly-available lessons and deliver these lessons using evidence-based teaching practices. We focus on people conducting and supporting research.
UC Data Network: Lessons Learned
Scholars at the University of California need effective solutions to preserve their research data. This is essential for complying with funder mandates, publication requirements, policies, and evolving norms of scholarly best practice. However, several cost barriers have impeded consistent, comprehensive preservation of UC research data. In an attempt to tackle some of these challenges, California Digital Library (CDL) brought together campus Vice Chancellors of Research (VCRs), Chief Information Officers (CIOs)/Research IT, and University Librarians (ULs) from across the UC system to explore the creation of a UC Data Network (UCDN) as a distributed storage solution.
For the past 18 months, CDL has led an exploratory pilot preservation project to establish UCDN with three campuses. We have now decided to conclude this pilot and want to take this opportunity to reflect on our successes and challenges in tackling such an ambitious scope of work. There are many lessons learned. We offer this post as a way of capturing some of the main findings and takeaways of the UCDN activities.
UCDN pilot project
Campuses routinely grapple with how to offer long term-preservation for the research data our researchers create. The goal of the UCDN project was to chip away at one consistent hurdle: recurring data storage costs associated with long-term digital preservation. In early 2018, we brought together VCRs, CIOs, and ULs across the UC system to explore pilot ideas for tackling this hurdle. From those consultations we crafted a pilot project: Pilot campuses would make upfront capital investments in storage and CDL would plug that storage into our Merritt preservation repository. This storage, via the preservation repository, would then be used by UC’s Dash data publishing platform. In essence, the pilot entailed moving the costs of preserving published datasets from a recurring individual campus expense to a shared UC-wide investment.
What we learned
After nearly 18 months, we have decided to conclude the UCDN pilot. We have learned several lessons that can help guide where we go next.
Lesson #1. We need to make preservation a more compelling story for users. It was difficult to demonstrate UCDN’s value to researchers. We were piloting a service that focused on the back-end storage costs for back-end preservation services. This was not an easy story to tell and quite often our outreach to campuses and researchers was lost when describing this relationship.
Lesson #2. Project ownership is key. We knew that buy-in from multiple departments was key to the success of UCDN. Campus IT teams, libraries, and research offices all needed to own this effort and we were successful in getting traction at the beginning. However, as time progressed and storage provision became one immediate task, we saw that the project lost broad ownership. While commitment remained high, we were not able to find specific champions to ensure the pilot remained top priority.
Lesson #3. Smaller scale ≠ smaller scope. We started the project knowing that multiple campuses provisioning and maintaining storage for a pilot might be risky. To help mitigate this, we started with a set of 3-4 campuses. This smaller set of campuses, however, did not reduce the overall complexity of the project and we quickly saw that reducing the scale of the pilot did not reduce the scope of the effort: instead of working on a small pilot, we ended up trying to achieve a full solution at fewer places.
Lesson #4. Systemwide efforts are not necessarily (or uniquely) efficient. Our original premise was that a systemwide effort at data preservation would be the most efficient approach. However, as the pilot progressed, we realized that the wider academic community beyond UC was also grappling with similar cost issues. Pilot team members realized that appropriate economies of scale should actually come from collaborations beyond the UC system.
Lesson #5. We need to keep our eyes on the prize. Our original goal was to remove the cost barriers to data preservation. The UCDN pilot team remained focused on this as our goal and the pilot experience gave us the space to brainstorm alternative approaches to tackling this issue. This consistent focus on our ultimate goal eventually led to the partnership CDL forged with Dryad (described further below).
What’s next
While we have decided not to continue with the UCDN pilot, we now are in the position to leverage our lessons learned to move forward and achieve the original goals for the UCDN effort by focusing time and resources on our new Dryad partnership.
CDL is now putting the finishing touches on the rollout of the Dryad data publishing service across all UC campuses. Dryad is a trusted name in the researcher community and, with this new arrangement, it will be a space where UC researchers can publish their datasets in a repository with consistent preservation policies at no costs to the researcher, department, or campus. This means that UC will be able to simultaneously drive adoption of data publishing and long-term stewardship in one space…and without the hurdles associated with recurring storage costs. And with this, we will have met the original goals of the UCDN project.
csv,conf,v4: call for proposals
Although a ubiquitous term, the acronym CSV has varied meanings depending on who you ask. In the data space, CSV often translates to comma-separated values – a machine-readable data format used to store tabular data in plain text. To many, the format represents simplicity, interoperability, compactness, hackability, among other things.
From when it first launched in July 2014 as a conference for data makers everywhere, csv,conf adopted the comma-separated-values format in its branding metaphorically. Needless to say, as a data conference that brings together people from different disciplines and domains, conversations and anecdotes shared at csv,conf are not limited to the CSV file format.
On May 8-9, 2019, the fourth version of csv,conf is set to take place at Eliot Center in Portland, Oregon. Over two days, attendees will have the opportunity to hear about ongoing work, share skills, exchange ideas (and stickers!) and kickstart collaborations. You are welcome to submit session proposals for our 25-minute talk slots between now and end of day, February 9, 2019.
The commallama has now become a big and fun part of csv,conf. How did we settle on this llama? What is its significance? Is it even a llama? We hear your questions, and implore you to join us in Portland on May 8 and 9 to meet the commallama and find out!
We are keen on getting as many people as possible to csv,conf,v4, and will award travel grants to subsidize travel and associated costs for interested parties that lack the resources and support to get them to Portland. To that end, we have set up our honor-system, conference ticketing page on Eventbrite. We encourage you to get your conference tickets as soon as possible, keeping in mind that as a non-profit and community-run conference, proceeds from ticket sales will help cover our catering and venue costs in addition to offering travel support for speakers and attendees where needed.
From the first three conferences held in the last four years, csv,conf brought together over 500 participants from over 30 countries. And 300+ talks spanning over 180 hours have been presented, packaged and shared on our YouTube channel. Many post-conference narratives and think pieces, as well as interdisciplinary collaborations have also surfaced from previous conferences. This is only part of the story, and we can’t wait to see and hear from you in Portland in May, and are excited for all that awaits!
The UC3 team is part of the conference committee and happy to answer any questions you may have. Feel free to reach out to us at uc3@ucop.edu or to the full committee at csv-conf-coord@googlegroups.com.