(index page)

Identifier services at CDL: Connecting our communities

Last week, more than 1100 people registered to attend PIDapalooza21, a 24-hour-long virtual event celebrating persistent identifiers and the communities that use them. Held online for the first time due to the COVID-19 pandemic, PIDapalooza21 was able to attract a much larger audience than any of the four previous PIDapaloozas. But even by virtual event standards, 1100 people is still a lot! Does this mean that persistent identifiers are now mainstream? And if so, what does this mean for the identifier services and initiatives that we lead at California Digital Library?

Background: Identifiers at CDL

The identifiers portfolio at CDL encompasses technical infrastructure, such as the EZID service, and cross-organization collaborations, such as CDL’s leadership role in the Research Organization Registry (ROR) as well as PIDapalooza. In the spirit of CDL’s broader mission and vision, the identifiers portfolio aims to enable the discoverability, citability, and long-term stewardship of UC data, research outputs, publications, special collections, and archives. We do this by supporting identifier use and adoption across the UC campuses, enriching CDL’s core infrastructure for publishing, data management, and preservation, and contributing to global identifier initiatives that can help to further amplify UC scholarship.

Identifiers in open research infrastructure

While persistent identifiers themselves are associated with fixity and stability, the landscape in which they are situated is constantly shifting. Research infrastructure is becoming more complex and more connected. Institutions, publishers, funders, and policymakers are under more pressure to track and quantify research activities. Infrastructure providers face growing costs and competition.

Identifiers can help us navigate this landscape and address these challenges. But persistent identifiers alone are not the solution.

A DOI string for a dataset doesn’t tell us anything about what data is captured, or who created it, where that researcher is affiliated, or which funders supported the research project. But if the metadata registered with the DOI includes these details, the identifier becomes meaningful and powerful.

While more people are paying attention to identifiers these days—as evidenced by the record attendance at PIDapalooza—there is still a need to drive home the point that the identifiers themselves are not the goal; it’s about what the identifiers can do. In order to fulfill the true promise of identifiers, we need to be able to connect them through open data, open metadata, and open infrastructure. And we need communities to understand how to do this and why it is important.

What does this mean for EZID?

A year ago, we reflected on EZID’s evolution as the service has pivoted from its original foundations to pursue a new vision for the future of identifier services at CDL. In the past year, work on EZID has been focused on modernizing and strengthening its core infrastructure so that we can achieve this vision. Some highlights include:

We completed a migration to Linux2 and are working on upgrading the EZID codebase.
We streamlined core functionality for shoulder handling and identifier minting to remove dependencies on external systems.
We added support for the latest versions of the DataCite and Crossref DOI metadata schemas so that users can include richer metadata for their DOIs and also register DOIs for new content types such as preprints.

As 2021 gets underway, work on EZID and across the identifiers portfolio in general will involve building on these infrastructure efforts to harness the power of PIDs. Our goals fall into three main areas:

Enriching our metadata: Encourage and enable better DOI metadata: Help users take advantage of the DataCite and Crossref schemas by providing more descriptive information to support tracking and discovery.
Opening and connecting our infrastructure: Help users understand the power of connecting research through identifiers in open infrastructure like the DMPHub, Make Data Count, and DataCite Commons.
Promoting best practices and policies: Provide recommendations and guidance for research stakeholders working with identifiers, such as those summarized in the recent report, Implementing Effective Practices for Data: Recommendations for Collaborative Research Support.

Identifiers are a key aspect of our collective research infrastructure and it is an exciting time to be working on how to best leverage them to connect our communities.

Persistent Identifier Services at CDL: A Rich Tapestry

EZID is one strand in a larger tapestry of persistent identifier activity at CDL. These activities, at their core, are focused on how and where persistent identifiers can help enrich and connect the scholarly outputs and cultural heritage materials of the University of California system. Persistent identifiers in this sense both drive and support CDL’s underlying mission to “provide[s] transformative digital library services, grounded in campus partnerships and extended through external collaborations, that amplify the impact of the libraries, scholarship, and resources of the University of California.”

The past year was a transitional one for EZID in particular and for CDL’s identifier services portfolio in general. In the first half of 2019, we completed a multi-year process to rescope EZID’s DOI services to focus exclusively on UC users. We worked to support non-UC users of our DOI services in setting up direct memberships with other providers through memberships with Crossref and DataCite. We also welcomed Rushiraj Nenuji to the development team as we said farewell to EZID’s long-time developer and original architect Greg Janée.

Last year, in the midst of these transitions, we posed the following question:

Rather than thinking about EZID solely as a tool or a service, we want to situate it instead as one layer of a deep and broad persistent identifier portfolio at CDL. EZID is a great tool for creating and managing DOIs and ARKs—what else could it do? And how might it also support infrastructure, training, and outreach for a more networked and interoperable scholarly communication ecosystem through the use and coordination of persistent identifiers?

Now, as we kick off the new year, we wanted to provide a brief update on what this persistent identifier services portfolio looks like, and how it will continue to evolve in the months ahead.

EZID remains involved in the day-to-day business of supporting DOI and ARK services for UC campuses as well as ARK services for non-UC EZID members. EZID development work is currently focused on strengthening and upgrading the application for long-term robustness and stability, and reconfiguring the platform to minimize dependencies on external systems. Future development work in the coming months will be geared toward optimizing the EZID user interface and adding more support for different metadata schemas.

From the portfolio perspective, we are working on a number of initiatives to encourage and enable the adoption and use of persistent identifiers across the UCs and beyond. A few examples:

We work closely with CDL’s eScholarship Publishing team to help UC journals obtain Crossref DOIs. An integration between eScholarship and EZID assigns DOIs automatically to eScholarship journal articles and sends the metadata to Crossref. These articles are then available to indexes, libraries, and other third parties, enhancing journals’ exposure and increasing the discoverability of their Open Access content. This service supports about 20 journals and our teams will expand to more publications in the year ahead. Two related efforts concern greater adoption of ARK identifiers for special collections objects (UCSF’s Industry Documents Library is one recent project), and DataCite DOIs for UC data repositories.

Organization identifiers are growing in visibility across the scholarly infrastructure landscape with the launch of the Research Organization Registry (ROR), of which CDL is a founding partner. The ROR registry now includes unique IDs for approximately 97,000 organizations, and these IDs are being supported in both DataCite and Crossref metadata. A number of platforms are integrating or looking to integrate ROR into their systems wherever affiliations are collected. The new Dryad platform was the first to pilot this type of ROR integration, and Dryad now has clean and consistent affiliation data for all of its datasets. With additional integrations expected in the new year, it will become increasingly easier for libraries and research administrators to track and analyze their institutions’ scholarly outputs.

Engaging with the broader PID community is another important aspect of our ongoing work. CDL is a member of the ORCID US Community, joining other institutions around the country in championing adoption and use of ORCID identifiers by UC researchers. We are also a founding sponsor of PIDapalooza, the festival of persistent identifiers now approaching its fourth year. We are collaborating within and beyond the UC in persistent identifier training and outreach, including providing guidance on identifiers for UC librarians, and organizing global workshops for stakeholders and practitioners.

All of these efforts showcase how persistent identifier services capture the spirit of the CDL’s vision as a “catalyst for deeply collaborative solutions providing a rich, intuitive and seamless environment for publishing, sharing and preserving our scholars’ increasingly diverse outputs.”

We are looking forward to the year ahead! As always, get in touch with your ideas and questions.

Passing the Torch of Persistence: EZID Development Update

Persistent identifiers are the backbone of scholarly communication infrastructure and long-term digital preservation, key to supporting a fully networked research ecosystem. CDL’s EZID service has been a leading example in the library and research community for how digital curation tools can enable and be enabled by persistent identifiers. The goal of the EZID service was to make the practice of creating and maintaining persistent identifiers, well, easy, and this remains the core feature of EZID to this day.

Achieving persistence with digital objects is a challenge even with a service like EZID. And sadly, achieving persistence with the people behind such services is its own challenge. This week, CDL officially bids farewell (following a transition we announced last year) to EZID’s lead developer, Greg Janée, who is moving on from the system he built ten years ago and has ably maintained over the past decade—a system known not only for making identifier management easy, but also for its reliability, robust API, impeccable documentation, and stellar uptime stats. As the digital curation landscape has been transformed over the years, with new organizations emerging in the identifier space, EZID has been a model of persistence in more ways than one, setting a standard to follow that will be part of Greg’s enduring legacy.

Fortunately for the UC system and for CDL, we will continue to benefit from Greg’s skills and knowledge as he assumes a new position as Director of the UC Santa Barbara Data Curation Program. We know Greg will bring his deep expertise to a broad range of research and preservation activities at UCSB, and we are looking forward to working with him through the networks and collaborations ongoing between UCSB and CDL.

And fortunately for EZID, Greg is passing the torch to another developer, who will be supporting EZID’s valuable services as we move into this new chapter. The EZID team is thrilled to welcome Rushiraj Nenuji, who joined us on May 1 and will be working as our software developer and technical lead.

Rushiraj is based in Santa Barbara and has a 50-50 split appointment between CDL and UCSB, where he is a Science Software Engineer at the National Center for Ecological Analysis and Synthesis (NCEAS). While he has been spending the last 2 months transitioning onto the EZID team, he is no stranger to UC3 as a past contributor to the Make Data Count project, which integrates with the Dash (soon to be Dryad) data publishing service. Rushiraj’s wide experience in front-end and API software engineering, informatics, and open scientific research infrastructure will be an asset for EZID as we pursue new directions and initiatives for the future of CDL’s identifier services portfolio.

Persistence and impermanence will always exist in tandem. And on this note, we bid farewell to Greg and extend a warm welcome to Rushiraj, who will continue the hard work of making identifiers easy and building on Greg’s efforts while exploring new directions for the future.

As always, if you have questions about EZID or about persistent identifiers in general, feel free to contact us at ezid@cdlib.org.

Where We Go From Here: An Update on EZID

EZID logo

Rather than thinking about EZID solely as a tool or a service, we want to situate it instead as one layer of a deep and broad persistent identifier portfolio at CDL. EZID is a great tool for creating and managing DOIs and ARKs—what else could it do?

For nearly a decade, California Digital Library’s EZID service has been the backbone of efforts to enable the open sharing, publication, and citation of research outputs through the use of persistent identifiers (PIDs) at all levels and layers of the scholarly communication ecosystem. EZID’s identifier services and N2T resolver have been used by institutions and organizations around the globe as well as across the University of California system.

Following the announcement of EZID service changes last August, we are in the final stages of a multi-year process to reposition EZID’s strategic focus and redefine its scope by transferring non-UC DOI clients off of the EZID platform to our partners at DataCite and Crossref. At the same time—and to reiterate what we have communicated previously—much of what EZID does will not be changing at all as a result of this transition: EZID continues to offer DOI services to UC members as before, and remains a key provider of Archival Resource Key (ARK) identifiers for users worldwide.

As we approach the end of this transition and as we begin the new year, we wanted to share some updates with our community about what we’ve been up to, where we’re at right now, and where we’re headed next.

The past

Since its inception, California Digital Library has been committed to providing both technical infrastructure and thought leadership in the persistent identifier space. Against the backdrop of major shifts in the ever-evolving scholarly publishing landscape, EZID has played a key role in helping institutions and individuals make their publications, research outputs, and other scholarly and cultural objects discoverable, citable, and manageable for both immediate and long-term access.

Originally envisioned as a possible one-stop shop for persistent identifiers of all stripes, EZID’s scope over the past decade has been more specifically focused on providing high-quality service for two types of PIDs in particular—Digital Object Identifiers (DOIs) and Archival Resource Keys (ARKs)—for clients both within the UC system and around the world.

The recent decision to transition the scope of EZID’s DOI services was motivated by the desire to support the growth of our community partners at DataCite and Crossref while freeing up CDL’s own resources to imagine and embark on new directions for the next 10+ years.

The present

The EZID team has been working steadily since August 2017 to transition existing non-UC DOI clients to DataCite and Crossref. As of January 2019, we have transferred the majority of these clients, and we have been in touch with all of our clients to support them in their transitions.

We understand that the transition process can require resources, time, and coordination, some or all of which may not be easy to come by. For those who are not already aware, we have provided guidance to help clients navigate this process, and we remain available for direct consultations by email and phone. Contact the EZID team if you have questions about this effort.

Meet EZID’s new service manager

In addition to the service changes that EZID has gone through in the past year, the EZID team itself has also been evolving. Following service manager Joan Starr’s retirement in June, CDL’s Perry Willett has been handling the day-to-day responsibilities related to client communications and support as our non-UC clients transition their accounts, all the while maintaining EZID’s service relationships with UC partners.

In November 2018, CDL hired me (Maria Gould) to assume EZID product management and service responsibilities going forward. I wanted to take this opportunity to formally introduce myself to the community, to let our clients and partners know to expect seeing a new name and face around these parts. Hello!

The future

So, what does the future look like for EZID? For the time being, expect a combination of business as usual and bigger-picture brainstorming.

While we will continue to provide DOI and ARKs for UC campuses and ARKs for non-UC clients on a day-to-day basis, we are also turning to the question of how we might leverage our unique capacity and expertise in the PID space to pursue new projects and other opportunities.

As part of this process, we are reframing the way in which we conceptualize EZID’s purpose and scope. Rather than thinking about EZID solely as a tool or a service, we want to situate it instead as one layer of a deep and broad persistent identifier portfolio at CDL. EZID is a great tool for creating and managing DOIs and ARKs—what else could it do? And how might it also support infrastructure, training, and outreach for a more networked and interoperable scholarly communication ecosystem through the use and coordination of persistent identifiers?

CDL has a long history of investing in initiatives aimed at building a more robust and coherent suite of scholarly communication options for the research and library community, and we are committed to renewing these investments in the years to come.

Stay in touch

Whether you are a current or past EZID client, or perhaps merely interested in how persistent identifiers can support scholarly communication, please let us know if you have any thoughts or suggestions about new directions we might pursue in 2019 and beyond. We are keen to understand questions like:

How do organizations and institutions use and benefit from third-party identifier managers?
What are the identifier types that our communities need?
What are the knowledge gaps and training needs?
And more….

We will post more information in this space and conduct more targeted outreach with stakeholders as our plans begin to take shape.

We look forward to being in touch!

DataCite Metadata Schema update

This spring, work is underway on a new version of the DataCite metadata schema. DataCite is a worldwide consortium founded in 2009 dedicated to “helping you find, access, and reuse data.” The principle mechanism for doing so is the registration of digital object identifiers (DOIs) via the member organizations. To make sure dataset citations are easy to find, each registration for a DataCite DOI has to be accompanied by a small set of citation metadata. It is small on purpose: this is intended to be a “big tent” for all research disciplines. DataCite has specified these requirements with a metadata schema.

The team in charge of this task is the Metadata Working Group. This group responds to suggestions from DataCite clients and community members. I chair the group, and my colleagues on the group come from the British Library, GESIS, the TIB, CISTI, and TU Delft.

The new version of the schema, 2.3, will be the first to be paired with a corresponding version in the Dublin Core Application Profile format. It fulfills a commitment that the Working Group made with its first release in January of 2011. The hope is that the application profile will promote interoperability with Dublin Core, a common metadata format in the library community, going forward. We intend to maintain synchronization between the schema and the profile with future versions.

Additional changes will include some new selections for the optional fields including support for a new relationType (isIdenticalTo), and we’re considering a way to specify temporal collection characteristics of the resource being registered. This would mean describing, in simple terms and optionally, a data set collected between two dates. There are a few other changes under discussion as well, so stay tuned.

DataCite metadata is available in the Search interface to the DataCite Metadata Store. The metadata is also exposed for harvest, via an OAI-PMH protocol. California Digital Library is a founding member, and our DataCite implementation is the EZID service, which also offers ARKs, an alternative identifier scheme. Please let me know if you have any questions by contacting uc3 at ucop.edu.

EZID: now even easier to manage identifiers

EZID, the easy long-term identifier service, just got a new look. EZID lets you create and maintain ARKs and DataCite Digital Object Identifiers (DOIs), and now it’s even easier to use:

One stop for EZID and all EZID information, including webinars, FAQs, and more.
Image by Simon Cousins
- A clean, bright new look.
- No more hunting across two locations for the materials and information you need.

NEW Manage IDs functions:
- View all identifiers created by logged-in account;
- View most recent 10 interactions–based on the account–not the session;
- See the scope of your identifier work without any API programming.

NEW in the UI: Reserve an Identifier
- Create identifiers early in the research cycle;
- Choose whether or not you want to make your identifier public–reserve them if you don’t;
- On the Manage screen, view the identifier’s status (public, reserved, unavailable/just testing).

In the coming months, we will also be introducing these EZID user interface enhancements:

Enhanced support for DataCite metadata in the UI;
Reporting support for institution-level clients.

So, stay tuned: EZID just gets better and better!