Over the past year, California Digital Library (CDL) has facilitated a discussion between UC campus Vice Chancellors of Research (VCRs), Chief Information Officers (CIOs), and University Librarians (ULs) to explore pilot ideas for breaking down the high data storage costs associated with digital preservation. Our goal is to work in small, incremental ways towards building a sustainable and reliable network of storage nodes for sharing and preserving research data that does not rely on uncertain funding sources. In addition, we are looking to find ways for campuses to retain copies of their datasets in a financially responsible manner. This exploration is codenamed UCDN (UC Data Network).
UCDN: capital investment into campus storage
During our consultation period, the most popular UCDN pilot idea that materialized was that campuses could break this logjam by making upfront capital investment in storage. The hypothesis is that if pre-established storage nodes can be leveraged for research data preservation then this could remove or reduce the need for recurring charges being sent to cash-strapped departments and this could offer a way for each campus to retain copies of their outputs.
This idea gained traction and, over this past summer, three campuses volunteered to participate in a pilot to explore this idea further: UCSF, UC Irvine, and UC Riverside.
New pilot projects at UCSF, UC Irvine, and UC Riverside
Starting in the fall of 2018, IT teams at UCSF, UC Irvine, and UC Riverside are provisioning storage nodes to support this pilot project. As each campus gets its storage online, campus teams that span research offices, IT teams, and libraries will set up new procedures and/or re-evaluate existing procedures regarding the preservation of research data. They will then use this as an opportunity to re-engage with research projects that benefit from this new investment.
In addition to these campus-based collaborations, CDL will also connect to each new storage node as back-end storage components of our Merritt preservation repository. This will allow us to automatically leverage the new campus investment any time a researcher from one of the pilot campuses uses the Dash data publishing platform for publishing their research data. In addition, any researcher from one of these three campuses who is interested in working on other research data preservation projects can contact their local campus teams or UC3 for more information on utilizing this storage.
How you can get involved?
UCDN is a unique approach to back-end storage and preservation for research data. This new pilot is meant to help with streamlining campus administrative processes and establish more logical resource sharing. Through this, we hope it will also allow for more consistent processes for research data preservation to emerge.
How can you leverage this back end system in your research projects? If you are a researcher or research team lead at UCSF, UCI, or UCR, you can utilize this new resources by continuing to use (or beginning to use):
- Dash manual submissions: your campus offers the Dash data publishing platform for sharing datasets. Any researcher from UCSF, UCI, or UCR can sign-in at any time and submit a dataset to be published. All deposits are assigned a DataCite DOI to streamline citation and simplify the process for connecting your datasets to journal articles during the publishing process. We will leverage the new storage nodes for all manual deposits from these three campuses. You can learn more here:
- UCSF dash: https://datashare.ucsf.edu/
- UC Irvine dash: http://dash.lib.uci.edu/
- UC Riverside dash: https://dash.ucr.edu/
- Dash by API: Dash offers a sophisticated API for submitting datasets directly from other environments i.e. electronic notebooks, code repositories, web scripts, etc. We will leverage the new storage nodes for all API deposits from these three campuses. You can learn more about how to integrate Dash (and digital preservation) by visiting our tech documentation here:
- Technical How-To guide: https://github.com/CDLUC3/stash/blob/master/stash_api/basic_submission.md
- Swagger API documentation: https://dash.ucop.edu/api/docs/index.html
- Additional Projects: Researchers are routinely looking for digital preservation options for their research projects. When this is not available, this can sometimes result in orphaned datasets (those left on a hard drive, Box, or Drive) or orphaned data projects (those left on old lab webpages or old research collaboration pages). We can help move those datasets into Dash for long-term preservation (leveraging the new storage nodes). If you know of data in need of a long-term home, please contact UC3 or your campus data curation team:
A note about Dash: As we announced in May of 2018, CDL formally partnered with Dryad. We are in the process of migrating Dryad onto the Dash platform, at which point Dash will be rebranded as Dryad. This does not change the UCDN storage node pilot. The new Dryad service will continue to utilize localized shared storage for researchers at UCSF, UC Irvine, and UC Riverside.
Tackling the storage costs of preservation
By relying upon upfront capital investment of storage from IT teams rather than direct campus recharge to individual departments or libraries, we hope to remove common administrative and financial barriers to wider campus adoption of research data preservation. At the same time, we hope this will enable a simple way for campuses to retain copies of their research outputs in a financially sustainable way.
- Researchers from UCSF, UCR, UCI: To learn more about how you can leverage these resources during the pilot, please contact UC3 or your campus research office, IT departments, or libraries.
- Researchers from other UC campuses: While your campus is not piloting this approach to back-end storage, there are other preservation projects/services you can leverage. Please contact UC3 or your campus research office, IT departments, or libraries for details/ideas.