(index page)

Where We Go From Here: An Update on EZID

EZID logo

Rather than thinking about EZID solely as a tool or a service, we want to situate it instead as one layer of a deep and broad persistent identifier portfolio at CDL. EZID is a great tool for creating and managing DOIs and ARKs—what else could it do?

For nearly a decade, California Digital Library’s EZID service has been the backbone of efforts to enable the open sharing, publication, and citation of research outputs through the use of persistent identifiers (PIDs) at all levels and layers of the scholarly communication ecosystem. EZID’s identifier services and N2T resolver have been used by institutions and organizations around the globe as well as across the University of California system.

Following the announcement of EZID service changes last August, we are in the final stages of a multi-year process to reposition EZID’s strategic focus and redefine its scope by transferring non-UC DOI clients off of the EZID platform to our partners at DataCite and Crossref. At the same time—and to reiterate what we have communicated previously—much of what EZID does will not be changing at all as a result of this transition: EZID continues to offer DOI services to UC members as before, and remains a key provider of Archival Resource Key (ARK) identifiers for users worldwide.

As we approach the end of this transition and as we begin the new year, we wanted to share some updates with our community about what we’ve been up to, where we’re at right now, and where we’re headed next.

The past

Since its inception, California Digital Library has been committed to providing both technical infrastructure and thought leadership in the persistent identifier space. Against the backdrop of major shifts in the ever-evolving scholarly publishing landscape, EZID has played a key role in helping institutions and individuals make their publications, research outputs, and other scholarly and cultural objects discoverable, citable, and manageable for both immediate and long-term access.

Originally envisioned as a possible one-stop shop for persistent identifiers of all stripes, EZID’s scope over the past decade has been more specifically focused on providing high-quality service for two types of PIDs in particular—Digital Object Identifiers (DOIs) and Archival Resource Keys (ARKs)—for clients both within the UC system and around the world.

The recent decision to transition the scope of EZID’s DOI services was motivated by the desire to support the growth of our community partners at DataCite and Crossref while freeing up CDL’s own resources to imagine and embark on new directions for the next 10+ years.

The present

The EZID team has been working steadily since August 2017 to transition existing non-UC DOI clients to DataCite and Crossref. As of January 2019, we have transferred the majority of these clients, and we have been in touch with all of our clients to support them in their transitions.

We understand that the transition process can require resources, time, and coordination, some or all of which may not be easy to come by. For those who are not already aware, we have provided guidance to help clients navigate this process, and we remain available for direct consultations by email and phone. Contact the EZID team if you have questions about this effort.

Meet EZID’s new service manager

In addition to the service changes that EZID has gone through in the past year, the EZID team itself has also been evolving. Following service manager Joan Starr’s retirement in June, CDL’s Perry Willett has been handling the day-to-day responsibilities related to client communications and support as our non-UC clients transition their accounts, all the while maintaining EZID’s service relationships with UC partners.

In November 2018, CDL hired me (Maria Gould) to assume EZID product management and service responsibilities going forward. I wanted to take this opportunity to formally introduce myself to the community, to let our clients and partners know to expect seeing a new name and face around these parts. Hello!

The future

So, what does the future look like for EZID? For the time being, expect a combination of business as usual and bigger-picture brainstorming.

While we will continue to provide DOI and ARKs for UC campuses and ARKs for non-UC clients on a day-to-day basis, we are also turning to the question of how we might leverage our unique capacity and expertise in the PID space to pursue new projects and other opportunities.

As part of this process, we are reframing the way in which we conceptualize EZID’s purpose and scope. Rather than thinking about EZID solely as a tool or a service, we want to situate it instead as one layer of a deep and broad persistent identifier portfolio at CDL. EZID is a great tool for creating and managing DOIs and ARKs—what else could it do? And how might it also support infrastructure, training, and outreach for a more networked and interoperable scholarly communication ecosystem through the use and coordination of persistent identifiers?

CDL has a long history of investing in initiatives aimed at building a more robust and coherent suite of scholarly communication options for the research and library community, and we are committed to renewing these investments in the years to come.

Stay in touch

Whether you are a current or past EZID client, or perhaps merely interested in how persistent identifiers can support scholarly communication, please let us know if you have any thoughts or suggestions about new directions we might pursue in 2019 and beyond. We are keen to understand questions like:

How do organizations and institutions use and benefit from third-party identifier managers?
What are the identifier types that our communities need?
What are the knowledge gaps and training needs?
And more….

We will post more information in this space and conduct more targeted outreach with stakeholders as our plans begin to take shape.

We look forward to being in touch!

Resources, and Versions, and Identifiers! Oh, my!

The only constant is change. —Heraclitus

Data publication, management, and citation would all be so much easier if data never changed, or at least, if it never changed after publication. But as the Greeks observed so long ago, change is here to stay. We must accept that data will change, and given that fact, we are probably better off embracing change rather than avoiding it. Because the very essence of data citation is identifying what was referenced at the time it was referenced, we need to be able to put a name on that referenced quantity, which leads to the requirement of assigning named versions to data. With versions we are providing the x that enables somebody to say, “I used version x of dataset y.”

Since versions are ultimately names, the problem of defining versions is inextricably bound up with the general problem of identification. Key questions that must be asked when addressing data versioning and identification include:

What is being identified by a version? This can be a surprisingly subtle question. Is a particular set of bits being identified? A conceptual quantity (to use FRBR terms, an expression or manifestation)? A location? A conceptual quantity at a location? For a resource that changes rapidly or predictably, such as a data stream that accumulates over time, it will probably be necessary to address the structure of the stream separately from the content of the stream, and to support versions and/or citation mechanisms that allow the state of the stream to be characterized at the time of reference. In any case, the answer to the question of what is being identified will greatly impact both what constitutes change (and therefore what constitutes a version) and the appropriateness of different identifier technologies to identifying those versions.
When does a change constitute a new version? Always? Even when only a typographical error is being corrected? Or, in a hypertext document, when updating a broken hyperlink? (This is a particularly difficult case, since updating a hyperlink requires updating the document, of course, but a URL is really a property of the identifiee, not the identifier.) In the case of a science dataset, does changing the format of the data constitute a new version? Reorganizing the data within a format (e.g., changing from row-major to column-major order)? Re-computing the data on different floating-point hardware? Versions are often divided into “major” versions and “minor” versions to help characterize the magnitude and backward-compatibility of changes.
Is each version an independent resource? Or is there one resource that contains multiple versions? This may seem a purely semantic distinction, but the question has implications on how the resource is managed in practice. The W3C struggled with this question in identifying the HTML specification. It could have created one HTML resource with many versions (3.1, 4.2, 5, …), but for manageability it settled on calling HTML3 one resource (with versions 3.1, 3.2, etc.), HTML4 a separate resource (with analogous versions 4.1, 4.2, etc.), and continuing on to HTML5 as yet another resource.

So far we have only raised questions, and that’s the nature of dealing with versions: the answers tend to be very situation-specific. Fortunately, some broad guidelines have emerged:

Assign an identifier to each version to support identification and citation.
Assign an identifier to the resource as a whole, that is, to the resource without considering any particular version of the resource. There are many situations where it is desirable to be able to make a version-agnostic reference. Consider that, in the text above, we were able to refer to something called “HTML4” without having to name any particular version of that resource. What if that were not possible?
Provide linkages between the versions, and between the versions and the resource as a whole.

These guidelines still leave the question of how to actually assign identifiers to versions unanswered. One approach is to assign a different, unrelated identifier to each version. For example, doi:10.1234/FOO might refer to version 1 of a resource and doi:10.5678/BAR to version 2. Linkages, stored in the resource versions themselves or externally in a database, can record the relationships between these identifiers. This approach may be appropriate in many cases, but it should be recognized that it places a burden on both the resource maintainer (every link that must be maintained represents a breakage point) and user (there is no easily visible or otherwise obvious relationship between the identifiers). Another approach is to syntactically encode version information in the identifiers. With this approach, we might start with doi:10.1234/FOO as a base identifier for the resource, and then append version information in a visually apparent way. For example, doi:10.1234/FOO/v1 might refer to version 1, doi:10.1234/FOO/v2 to version 2, and so forth. And in a logical extension we could then treat the version-less identifier doi:10.1234/FOO as identifying the resource as a whole. This is exactly the approach used by the arXiv preprint service.

Resources, versions, identifiers, citations: the issues they present tend to get bound up in a Gordian knot. Oh, my!

DataCite Metadata Schema update

This spring, work is underway on a new version of the DataCite metadata schema. DataCite is a worldwide consortium founded in 2009 dedicated to “helping you find, access, and reuse data.” The principle mechanism for doing so is the registration of digital object identifiers (DOIs) via the member organizations. To make sure dataset citations are easy to find, each registration for a DataCite DOI has to be accompanied by a small set of citation metadata. It is small on purpose: this is intended to be a “big tent” for all research disciplines. DataCite has specified these requirements with a metadata schema.

The team in charge of this task is the Metadata Working Group. This group responds to suggestions from DataCite clients and community members. I chair the group, and my colleagues on the group come from the British Library, GESIS, the TIB, CISTI, and TU Delft.

The new version of the schema, 2.3, will be the first to be paired with a corresponding version in the Dublin Core Application Profile format. It fulfills a commitment that the Working Group made with its first release in January of 2011. The hope is that the application profile will promote interoperability with Dublin Core, a common metadata format in the library community, going forward. We intend to maintain synchronization between the schema and the profile with future versions.

Additional changes will include some new selections for the optional fields including support for a new relationType (isIdenticalTo), and we’re considering a way to specify temporal collection characteristics of the resource being registered. This would mean describing, in simple terms and optionally, a data set collected between two dates. There are a few other changes under discussion as well, so stay tuned.

DataCite metadata is available in the Search interface to the DataCite Metadata Store. The metadata is also exposed for harvest, via an OAI-PMH protocol. California Digital Library is a founding member, and our DataCite implementation is the EZID service, which also offers ARKs, an alternative identifier scheme. Please let me know if you have any questions by contacting uc3 at ucop.edu.

EZID: now even easier to manage identifiers

EZID, the easy long-term identifier service, just got a new look. EZID lets you create and maintain ARKs and DataCite Digital Object Identifiers (DOIs), and now it’s even easier to use:

One stop for EZID and all EZID information, including webinars, FAQs, and more.
Image by Simon Cousins
- A clean, bright new look.
- No more hunting across two locations for the materials and information you need.

NEW Manage IDs functions:
- View all identifiers created by logged-in account;
- View most recent 10 interactions–based on the account–not the session;
- See the scope of your identifier work without any API programming.

NEW in the UI: Reserve an Identifier
- Create identifiers early in the research cycle;
- Choose whether or not you want to make your identifier public–reserve them if you don’t;
- On the Manage screen, view the identifier’s status (public, reserved, unavailable/just testing).

In the coming months, we will also be introducing these EZID user interface enhancements:

Enhanced support for DataCite metadata in the UI;
Reporting support for institution-level clients.

So, stay tuned: EZID just gets better and better!