Skip to main content

Tackling the storage costs of digital preservation

John Chodacki,

Over the past year, California Digital Library (CDL) has facilitated a discussion between UC campus Vice Chancellors of Research (VCRs), Chief Information Officers (CIOs), and University Librarians (ULs) to explore pilot ideas for breaking down the high data storage costs associated with digital preservation. Our goal is to work in small, incremental ways towards building a sustainable and reliable network of storage nodes for sharing and preserving research data that does not rely on uncertain funding sources. In addition, we are looking to find ways for campuses to retain copies of their datasets in a financially responsible manner.  This exploration is codenamed UCDN (UC Data Network).

UCDN: capital investment into campus storage

During our consultation period, the most popular UCDN pilot idea that materialized was that campuses could break this logjam by making upfront capital investment in storage. The hypothesis is that if pre-established storage nodes can be leveraged for research data preservation then this could remove or reduce the need for recurring charges being sent to cash-strapped departments and this could offer a way for each campus to retain copies of their outputs.

This idea gained traction and, over this past summer, three campuses volunteered to participate in a pilot to explore this idea further: UCSF, UC Irvine, and UC Riverside.

New pilot projects at UCSF, UC Irvine, and UC Riverside

Starting in the fall of 2018, IT teams at UCSF, UC Irvine, and UC Riverside are provisioning storage nodes to support this pilot project. As each campus gets its storage online, campus teams that span research offices, IT teams, and libraries will set up new procedures and/or re-evaluate existing procedures regarding the preservation of research data. They will then use this as an opportunity to re-engage with research projects that benefit from this new investment.

In addition to these campus-based collaborations, CDL will also connect to each new storage node as back-end storage components of our Merritt preservation repository. This will allow us to automatically leverage the new campus investment any time a researcher from one of the pilot campuses uses the Dash data publishing platform for publishing their research data. In addition, any researcher from one of these three campuses who is interested in working on other research data preservation projects can contact their local campus teams or UC3 for more information on utilizing this storage.

How you can get involved?

UCDN is a unique approach to back-end storage and preservation for research data. This new pilot is meant to help with streamlining campus administrative processes and establish more logical resource sharing. Through this, we hope it will also allow for more consistent processes for research data preservation to emerge.

How can you leverage this back end system in your research projects? If you are a researcher or research team lead at UCSF, UCI, or UCR, you can utilize this new resources by continuing to use (or beginning to use):

A note about Dash: As we announced in May of 2018, CDL formally partnered with Dryad. We are in the process of migrating Dryad onto the Dash platform, at which point Dash will be rebranded as Dryad. This does not change the UCDN storage node pilot. The new Dryad service will continue to utilize localized shared storage for researchers at UCSF, UC Irvine, and UC Riverside.

Tackling the storage costs of preservation

By relying upon upfront capital investment of storage from IT teams rather than direct campus recharge to individual departments or libraries, we hope to remove common administrative and financial barriers to wider campus adoption of research data preservation. At the same time, we hope this will enable a simple way for campuses to retain copies of their research outputs in a financially sustainable way.