Scholars at the University of California need effective solutions to preserve their research data. This is essential for complying with funder mandates, publication requirements, policies, and evolving norms of scholarly best practice. However, several cost barriers have impeded consistent, comprehensive preservation of UC research data. In an attempt to tackle some of these challenges, California Digital Library (CDL) brought together campus Vice Chancellors of Research (VCRs), Chief Information Officers (CIOs)/Research IT, and University Librarians (ULs) from across the UC system to explore the creation of a UC Data Network (UCDN) as a distributed storage solution.
For the past 18 months, CDL has led an exploratory pilot preservation project to establish UCDN with three campuses. We have now decided to conclude this pilot and want to take this opportunity to reflect on our successes and challenges in tackling such an ambitious scope of work. There are many lessons learned. We offer this post as a way of capturing some of the main findings and takeaways of the UCDN activities.
UCDN pilot project
Campuses routinely grapple with how to offer long term-preservation for the research data our researchers create. The goal of the UCDN project was to chip away at one consistent hurdle: recurring data storage costs associated with long-term digital preservation. In early 2018, we brought together VCRs, CIOs, and ULs across the UC system to explore pilot ideas for tackling this hurdle. From those consultations we crafted a pilot project: Pilot campuses would make upfront capital investments in storage and CDL would plug that storage into our Merritt preservation repository. This storage, via the preservation repository, would then be used by UC’s Dash data publishing platform. In essence, the pilot entailed moving the costs of preserving published datasets from a recurring individual campus expense to a shared UC-wide investment.
What we learned
After nearly 18 months, we have decided to conclude the UCDN pilot. We have learned several lessons that can help guide where we go next.
Lesson #1. We need to make preservation a more compelling story for users. It was difficult to demonstrate UCDN’s value to researchers. We were piloting a service that focused on the back-end storage costs for back-end preservation services. This was not an easy story to tell and quite often our outreach to campuses and researchers was lost when describing this relationship.
Lesson #2. Project ownership is key. We knew that buy-in from multiple departments was key to the success of UCDN. Campus IT teams, libraries, and research offices all needed to own this effort and we were successful in getting traction at the beginning. However, as time progressed and storage provision became one immediate task, we saw that the project lost broad ownership. While commitment remained high, we were not able to find specific champions to ensure the pilot remained top priority.
Lesson #3. Smaller scale ≠ smaller scope. We started the project knowing that multiple campuses provisioning and maintaining storage for a pilot might be risky. To help mitigate this, we started with a set of 3-4 campuses. This smaller set of campuses, however, did not reduce the overall complexity of the project and we quickly saw that reducing the scale of the pilot did not reduce the scope of the effort: instead of working on a small pilot, we ended up trying to achieve a full solution at fewer places.
Lesson #4. Systemwide efforts are not necessarily (or uniquely) efficient. Our original premise was that a systemwide effort at data preservation would be the most efficient approach. However, as the pilot progressed, we realized that the wider academic community beyond UC was also grappling with similar cost issues. Pilot team members realized that appropriate economies of scale should actually come from collaborations beyond the UC system.
Lesson #5. We need to keep our eyes on the prize. Our original goal was to remove the cost barriers to data preservation. The UCDN pilot team remained focused on this as our goal and the pilot experience gave us the space to brainstorm alternative approaches to tackling this issue. This consistent focus on our ultimate goal eventually led to the partnership CDL forged with Dryad (described further below).
While we have decided not to continue with the UCDN pilot, we now are in the position to leverage our lessons learned to move forward and achieve the original goals for the UCDN effort by focusing time and resources on our new Dryad partnership.
CDL is now putting the finishing touches on the rollout of the Dryad data publishing service across all UC campuses. Dryad is a trusted name in the researcher community and, with this new arrangement, it will be a space where UC researchers can publish their datasets in a repository with consistent preservation policies at no costs to the researcher, department, or campus. This means that UC will be able to simultaneously drive adoption of data publishing and long-term stewardship in one space…and without the hurdles associated with recurring storage costs. And with this, we will have met the original goals of the UCDN project.