(index page)

Expanding Data and Software Skills Training

The UC3 team consults with the broad UC library community regarding research data management and data publishing in various capacities. Over the past several years, we have seen a keen interest in expanding skills training opportunities for librarians in the areas of data skills and software skills. In an effort to amplify existing training programs and build capacity across the UC, we are looking forward to continuing our work in 2021 in the following ways:

Collaboration with The Carpentries

When it comes to teaching data science skills in a format that is community-based, hands-on train-the-trainer style, few opportunities can compare to The Carpentries. In 2017, we established a working partnership with them that has been tremendously successful. While The Carpentries originally focused exclusively on Data and Software Carpentry, it has been our work together, supported by a generous grant from the IMLS, to expand their remit and to introduce and grow Library Carpentry. This investment has helped equip our library communities and also led to a large and vibrant international Library Carpentry community of which UC as a whole is now a significant part. In 2020, we completed our project work on expanding The Carpentries across California.

In 2021, we aim to keep that momentum going. Many of our team members will be actively involved at an organizational level:

Eric Lopatin will continue his role as Maintainer of the Intro to GitHub lesson and regular workshop Instructor and Helper.
Catherine Nancarrow will remain a member of the Library Carpentry Advisory Group. In addition, she will extend our partnership with Carpentries staff by contributing to the implementation of a new Membership program that will afford greater availability of training to a wide variety of diverse library communities.
John Chodacki will join the Carpentries Executive Council, which is responsible for strategic and organizational planning, financial oversight, and overall service and program assessment.

Other avenues for collaboration

In addition to the Carpentries, UC3 is consistently looking for skill training programs and events to support in 2021.

Our team is active in FORCE11 and we are excited to collaborate with the UCLA-hosted FORCE11 Scholarly Communications Institute (FSCI). The next FSCI will be held July 26-30, 2021, and we hope to see many colleagues there.
We collaborate regularly with colleagues at the San Diego Supercomputing Center’s Research Data Services Division and will continue to assist and support them as they host FAIR data training and webinars throughout the year.
The UC Libraries’ Digital Preservation Working Group is transitioning to a new structure that will focus on training opportunities for librarians in the area of digital preservation. UC3 will be helping with the development and roll out of resulting projects.
Our partnership with Dryad has focused attention on the important role of data curation. We are working with Dryad and their colleagues at the Data Curation Network (DCN) on how to best extend the DCN’s training and primers to be available and applicable to new institutions and communities.

Lots of work ahead

We are always open to hearing your ideas for ways to deepen our work with skills building and training at all levels. We are especially keen to collaborate on projects that will benefit the library community and look forward to another successful year ahead!

This is the last post in the “A Peek Into 2021 for UC3” series.

Furthering Open Science through Research Data Management Services

As I begin my second year at CDL, I am excited to outline the objectives and key activities for my work: furthering research data management (RDM) practices that support open science at the University of California and beyond.

I conceptualize our work in the larger context of what an ideal RDM ecosystem might be: wherein open science practices are universally understood and implemented by data creators and stewards and built upon the bedrock of simple, interoperable RDM infrastructure and optimal open data policy. Below are four key ways in which RDM services at CDL contribute to this overall effort in 2021:

Facilitating Communication Between Data Librarians and Researchers

For almost ten years now, the DMPTool web application has provided accessible, jargon-free, practical guidance for researchers to create and implement effective data management plans for 30+ funding agencies. Thanks to our dedicated Editorial Board we are able to keep the tool up in sync with current funder requirements and best practices.

In 2021, we will be expanding our outreach to the library community by offering quarterly community calls with DMPTool users in order to discuss new features, highlight community use, and facilitate feedback. Additionally, the DMPTool Editorial Board will analyze existing guidance within the tool to identify aspects that need to be updated or new topics that should be included. The DMPTool has long been a community-supported application and we will continue to expand our engagement with the community as we grow the application.

Serving as an Interoperable Partner in Essential RDM Services

Our work developing the next generation of machine-actionable, networked DMPs builds upon community developed standards and is rooted in collaboration. In order to create the new networked DMP, these partnerships will continue to be essential to our success. Last year’s release of the RDA DMP Common Standard for machine-actionable Data Management Plans and the recent report Implementing Effective Data Practices: Stakeholder Recommendations for Collaborative Research Support (written by CDL, ARL, AAU & APLU) are testament to the power of these partnerships. We simply get more done when we work together. Additionally our continued collaboration with DMPonline allows us to share resources as we co-develop via the DMPRoadmap codebase, share best practices, and advance new features jointly.

Looking ahead, in 2021, we will expand on our collaborations including:

Partnering with DataCite to encourage adoption of the new DMP ID, a resource made possible by the forthcoming metadata scheme update. Expect more updates on this soon!
A new integration between the DMPTool and electronic lab notebook platforms, starting with RSpace.
Partnering with the UC Natural Reserve System and the Tetiaroa Society to advance data policies supporting open science at working field stations.

Supporting a Transparent Research Process

Much of our work last year was focused on developing the backend infrastructure necessary to confidently be able to say DMPTool DMPs are machine-actionable.

With the infrastructure in place and development completed, in 2021 we will be releasing several new features to expand the possibilities of the new networked DMP and help ensure transparency in the research process. Many of these new features are currently being pilot tested as part of the FAIR Island Project. We will also be conducting webinars in the coming weeks to gather feedback from the community to further inform our iterative feature development and release cycles.

Developing Optimal Open Data Policies

The FAIR Island project is a real-world use case evaluating the impact of implementing optimal research data management policies and requirements; the project will help demonstrate and publicize the outcomes of strong data policies in practice at a working field station.

With the recent addition of Erin Robinson to the team, the FAIR Island project is making swift progress towards implementing a data policy that will govern data collected on the Tetiaroa atoll. This data policy is still open to community feedback so if you are interested in contributing, now is your chance! Please share your thoughts via this survey.

In 2021, the FAIR Island project team will continue to advance and iterate on the data policy, working with additional field stations to advance data policies supporting open science. In partnership with the UC Natural Reserve System and 4Site network, we aim to move toward a common, optimal data policy that can be shared amongst UC field stations and other partner sites. To keep abreast of our progress please check out our project website where we are tracking project work in our blog.

How to contribute

Building on a solid foundation of community developed standards for DMPs and FAIR data, this year we will be moving much of this work from theory into real world implementation.

It’s an exciting time for these developments and we welcome all questions, comments, and advice. Please reach out with your thoughts!

Identifier services at CDL: Connecting our communities

Last week, more than 1100 people registered to attend PIDapalooza21, a 24-hour-long virtual event celebrating persistent identifiers and the communities that use them. Held online for the first time due to the COVID-19 pandemic, PIDapalooza21 was able to attract a much larger audience than any of the four previous PIDapaloozas. But even by virtual event standards, 1100 people is still a lot! Does this mean that persistent identifiers are now mainstream? And if so, what does this mean for the identifier services and initiatives that we lead at California Digital Library?

Background: Identifiers at CDL

The identifiers portfolio at CDL encompasses technical infrastructure, such as the EZID service, and cross-organization collaborations, such as CDL’s leadership role in the Research Organization Registry (ROR) as well as PIDapalooza. In the spirit of CDL’s broader mission and vision, the identifiers portfolio aims to enable the discoverability, citability, and long-term stewardship of UC data, research outputs, publications, special collections, and archives. We do this by supporting identifier use and adoption across the UC campuses, enriching CDL’s core infrastructure for publishing, data management, and preservation, and contributing to global identifier initiatives that can help to further amplify UC scholarship.

Identifiers in open research infrastructure

While persistent identifiers themselves are associated with fixity and stability, the landscape in which they are situated is constantly shifting. Research infrastructure is becoming more complex and more connected. Institutions, publishers, funders, and policymakers are under more pressure to track and quantify research activities. Infrastructure providers face growing costs and competition.

Identifiers can help us navigate this landscape and address these challenges. But persistent identifiers alone are not the solution.

A DOI string for a dataset doesn’t tell us anything about what data is captured, or who created it, where that researcher is affiliated, or which funders supported the research project. But if the metadata registered with the DOI includes these details, the identifier becomes meaningful and powerful.

While more people are paying attention to identifiers these days—as evidenced by the record attendance at PIDapalooza—there is still a need to drive home the point that the identifiers themselves are not the goal; it’s about what the identifiers can do. In order to fulfill the true promise of identifiers, we need to be able to connect them through open data, open metadata, and open infrastructure. And we need communities to understand how to do this and why it is important.

What does this mean for EZID?

A year ago, we reflected on EZID’s evolution as the service has pivoted from its original foundations to pursue a new vision for the future of identifier services at CDL. In the past year, work on EZID has been focused on modernizing and strengthening its core infrastructure so that we can achieve this vision. Some highlights include:

We completed a migration to Linux2 and are working on upgrading the EZID codebase.
We streamlined core functionality for shoulder handling and identifier minting to remove dependencies on external systems.
We added support for the latest versions of the DataCite and Crossref DOI metadata schemas so that users can include richer metadata for their DOIs and also register DOIs for new content types such as preprints.

As 2021 gets underway, work on EZID and across the identifiers portfolio in general will involve building on these infrastructure efforts to harness the power of PIDs. Our goals fall into three main areas:

Enriching our metadata: Encourage and enable better DOI metadata: Help users take advantage of the DataCite and Crossref schemas by providing more descriptive information to support tracking and discovery.
Opening and connecting our infrastructure: Help users understand the power of connecting research through identifiers in open infrastructure like the DMPHub, Make Data Count, and DataCite Commons.
Promoting best practices and policies: Provide recommendations and guidance for research stakeholders working with identifiers, such as those summarized in the recent report, Implementing Effective Practices for Data: Recommendations for Collaborative Research Support.

Identifiers are a key aspect of our collective research infrastructure and it is an exciting time to be working on how to best leverage them to connect our communities.