Identifier services at CDL: Connecting our communities

Maria Gould, February 5, 2021

Posted in: EZID

Last week, more than 1100 people registered to attend PIDapalooza21, a 24-hour-long virtual event celebrating persistent identifiers and the communities that use them. Held online for the first time due to the COVID-19 pandemic, PIDapalooza21 was able to attract a much larger audience than any of the four previous PIDapaloozas. But even by virtual event standards, 1100 people is still a lot! Does this mean that persistent identifiers are now mainstream? And if so, what does this mean for the identifier services and initiatives that we lead at California Digital Library?

Background: Identifiers at CDL

The identifiers portfolio at CDL encompasses technical infrastructure, such as the EZID service, and cross-organization collaborations, such as CDL’s leadership role in the Research Organization Registry (ROR) as well as PIDapalooza. In the spirit of CDL’s broader mission and vision, the identifiers portfolio aims to enable the discoverability, citability, and long-term stewardship of UC data, research outputs, publications, special collections, and archives. We do this by supporting identifier use and adoption across the UC campuses, enriching CDL’s core infrastructure for publishing, data management, and preservation, and contributing to global identifier initiatives that can help to further amplify UC scholarship.

Identifiers in open research infrastructure

While persistent identifiers themselves are associated with fixity and stability, the landscape in which they are situated is constantly shifting. Research infrastructure is becoming more complex and more connected. Institutions, publishers, funders, and policymakers are under more pressure to track and quantify research activities. Infrastructure providers face growing costs and competition.

Identifiers can help us navigate this landscape and address these challenges. But persistent identifiers alone are not the solution.

A DOI string for a dataset doesn’t tell us anything about what data is captured, or who created it, where that researcher is affiliated, or which funders supported the research project. But if the metadata registered with the DOI includes these details, the identifier becomes meaningful and powerful.

While more people are paying attention to identifiers these days—as evidenced by the record attendance at PIDapalooza—there is still a need to drive home the point that the identifiers themselves are not the goal; it’s about what the identifiers can do. In order to fulfill the true promise of identifiers, we need to be able to connect them through open data, open metadata, and open infrastructure. And we need communities to understand how to do this and why it is important.

What does this mean for EZID?

A year ago, we reflected on EZID’s evolution as the service has pivoted from its original foundations to pursue a new vision for the future of identifier services at CDL. In the past year, work on EZID has been focused on modernizing and strengthening its core infrastructure so that we can achieve this vision. Some highlights include:

We completed a migration to Linux2 and are working on upgrading the EZID codebase.
We streamlined core functionality for shoulder handling and identifier minting to remove dependencies on external systems.
We added support for the latest versions of the DataCite and Crossref DOI metadata schemas so that users can include richer metadata for their DOIs and also register DOIs for new content types such as preprints.

As 2021 gets underway, work on EZID and across the identifiers portfolio in general will involve building on these infrastructure efforts to harness the power of PIDs. Our goals fall into three main areas:

Enriching our metadata: Encourage and enable better DOI metadata: Help users take advantage of the DataCite and Crossref schemas by providing more descriptive information to support tracking and discovery.
Opening and connecting our infrastructure: Help users understand the power of connecting research through identifiers in open infrastructure like the DMPHub, Make Data Count, and DataCite Commons.
Promoting best practices and policies: Provide recommendations and guidance for research stakeholders working with identifiers, such as those summarized in the recent report, Implementing Effective Practices for Data: Recommendations for Collaborative Research Support.

Identifiers are a key aspect of our collective research infrastructure and it is an exciting time to be working on how to best leverage them to connect our communities.