(index page)
Community-Owned Data Publishing Infrastructure
As a library community, we continue to struggle to find scalable approaches to offering open, shared, sustainable scholarly infrastructure. This is especially true in the data publishing and research data management space where institution-focused approaches to capturing and curating data may be hindering our ability to grow adoption by our researchers.
To alleviate this impasse and jumpstart a new community-led approach, California Digital Library is formally partnering with Dryad to build a globally-accessible, transparent, and low-cost data publishing and curation service. The goal of this partnership is to completely reimagine the potential for Dryad, acting as an open, free community hub for collecting and curating data for researchers. It is not intended to compete with existing institution-based services, but to complement and amplify each of our campus’ efforts.
We hope that we can start a global discussion with institutions worldwide on better ways to support institutions and researchers in the face of rapid commercialization of the research data space. We cannot do this alone. For our collective action to effectively leverage institutional knowledge and serve researchers as end users, we need a diverse group of institutions to participate in defining the goals and values of this activity.
What does this look like?
We are putting the finishing touches on the migration of the Dryad service onto CDL’s technical platform. Dryad is a trusted name in the researcher community and, with this technical shift, it will be a space where institutional members will have transparent reporting features and the ability to join a global data curation community. Dryad will also be positioned to enhance technical integrations (via API) with publishing partners to seamlessly capture data publications at the time of article publishing. This means that we will be able to simultaneously drive adoption of data publishing and offer digital curation and stewardship in one space.
CDL Awarded IMLS Grant for Community-Owned Data Publishing Infrastructure
Supporting shared scholarly infrastructure must be done by the community and for the community. To help jumpstart this process, California Digital Library and Dryad will facilitating several one-on-one discussions and community workshops in the coming months to determine the features and services most needed in our community.
Our first community workshop will be held in December after CNI in Washington, DC. With funding from an IMLS National Infrastructure grant, we will host a facilitated discussion on institutional values, needs, and potential community-based business models that meet our collective goals, support our researchers, and create a sustainable, attractive new Dryad service offering. Our goal is to chart a path forward for this movement and gain concrete institutional commitments to joining the Dryad community.
How do you get involved?
Please read the latest blog post from Melissanne Scheld, Dryad’s Executive Director, about the next steps for Dryad.
Institutions: If a member of senior leadership would be interested in participating in our one-day workshop on December 12, 2018, please contact the UC Curation Center (UC3) at CDL for more information.
Don’t Worry: There will be additional workshops planned in the US and abroad. We will keep you posted on future opportunities to get involved in this important initiative. Please contact contact the UC Curation Center (UC3) at CDL for more information for more information.
The blog is cross-posted at CDLinfo: https://www.cdlib.org/cdlinfo/2018/10/24/community-owned-data-publishing-infrastructure/
Tackling the storage costs of digital preservation
Over the past year, California Digital Library (CDL) has facilitated a discussion between UC campus Vice Chancellors of Research (VCRs), Chief Information Officers (CIOs), and University Librarians (ULs) to explore pilot ideas for breaking down the high data storage costs associated with digital preservation. Our goal is to work in small, incremental ways towards building a sustainable and reliable network of storage nodes for sharing and preserving research data that does not rely on uncertain funding sources. In addition, we are looking to find ways for campuses to retain copies of their datasets in a financially responsible manner. This exploration is codenamed UCDN (UC Data Network).
UCDN: capital investment into campus storage
During our consultation period, the most popular UCDN pilot idea that materialized was that campuses could break this logjam by making upfront capital investment in storage. The hypothesis is that if pre-established storage nodes can be leveraged for research data preservation then this could remove or reduce the need for recurring charges being sent to cash-strapped departments and this could offer a way for each campus to retain copies of their outputs.
This idea gained traction and, over this past summer, three campuses volunteered to participate in a pilot to explore this idea further: UCSF, UC Irvine, and UC Riverside.
New pilot projects at UCSF, UC Irvine, and UC Riverside
Starting in the fall of 2018, IT teams at UCSF, UC Irvine, and UC Riverside are provisioning storage nodes to support this pilot project. As each campus gets its storage online, campus teams that span research offices, IT teams, and libraries will set up new procedures and/or re-evaluate existing procedures regarding the preservation of research data. They will then use this as an opportunity to re-engage with research projects that benefit from this new investment.
In addition to these campus-based collaborations, CDL will also connect to each new storage node as back-end storage components of our Merritt preservation repository. This will allow us to automatically leverage the new campus investment any time a researcher from one of the pilot campuses uses the Dash data publishing platform for publishing their research data. In addition, any researcher from one of these three campuses who is interested in working on other research data preservation projects can contact their local campus teams or UC3 for more information on utilizing this storage.
How you can get involved?
UCDN is a unique approach to back-end storage and preservation for research data. This new pilot is meant to help with streamlining campus administrative processes and establish more logical resource sharing. Through this, we hope it will also allow for more consistent processes for research data preservation to emerge.
How can you leverage this back end system in your research projects? If you are a researcher or research team lead at UCSF, UCI, or UCR, you can utilize this new resources by continuing to use (or beginning to use):
- Dash manual submissions: your campus offers the Dash data publishing platform for sharing datasets. Any researcher from UCSF, UCI, or UCR can sign-in at any time and submit a dataset to be published. All deposits are assigned a DataCite DOI to streamline citation and simplify the process for connecting your datasets to journal articles during the publishing process. We will leverage the new storage nodes for all manual deposits from these three campuses. You can learn more here:
- UCSF dash: https://datashare.ucsf.edu/
- UC Irvine dash: http://dash.lib.uci.edu/
- UC Riverside dash: https://dash.ucr.edu/
- Dash by API: Dash offers a sophisticated API for submitting datasets directly from other environments i.e. electronic notebooks, code repositories, web scripts, etc. We will leverage the new storage nodes for all API deposits from these three campuses. You can learn more about how to integrate Dash (and digital preservation) by visiting our tech documentation here:
- Technical How-To guide: https://github.com/CDLUC3/stash/blob/master/stash_api/basic_submission.md
- Swagger API documentation: https://dash.ucop.edu/api/docs/index.html
- Additional Projects: Researchers are routinely looking for digital preservation options for their research projects. When this is not available, this can sometimes result in orphaned datasets (those left on a hard drive, Box, or Drive) or orphaned data projects (those left on old lab webpages or old research collaboration pages). We can help move those datasets into Dash for long-term preservation (leveraging the new storage nodes). If you know of data in need of a long-term home, please contact UC3 or your campus data curation team:
A note about Dash: As we announced in May of 2018, CDL formally partnered with Dryad. We are in the process of migrating Dryad onto the Dash platform, at which point Dash will be rebranded as Dryad. This does not change the UCDN storage node pilot. The new Dryad service will continue to utilize localized shared storage for researchers at UCSF, UC Irvine, and UC Riverside.
Tackling the storage costs of preservation
By relying upon upfront capital investment of storage from IT teams rather than direct campus recharge to individual departments or libraries, we hope to remove common administrative and financial barriers to wider campus adoption of research data preservation. At the same time, we hope this will enable a simple way for campuses to retain copies of their research outputs in a financially sustainable way.
- Researchers from UCSF, UCR, UCI: To learn more about how you can leverage these resources during the pilot, please contact UC3 or your campus research office, IT departments, or libraries.
- Researchers from other UC campuses: While your campus is not piloting this approach to back-end storage, there are other preservation projects/services you can leverage. Please contact UC3 or your campus research office, IT departments, or libraries for details/ideas.
Lessons from Dat in the Lab: Webinar
We have received several inquiries about the status of our Dat-in-the-Lab project. To share our project outputs, we held a webinar on Friday, October 19, 2018. We spent the webinar showcasing our work and opening up a dialogue with the community on next steps.
Please learn more about our project and lessons learned by watching the recording of our webinar.
Lessons from Dat in the Lab – Agenda
Friday October 19th, 2018
8 am San Francisco / 11 am New York / 4pm London / 8:30 PM Delhi
- Introduction and overview of the ‘Dat in the Lab’ project
- Anacapa: Archiving and sharing analysis pipelines with Singularity and Dat
- Discussion on containerized workflows and sharing
- Questions and discussion
- What’s next?
How to watch the webinar
Webinar: Dat in the Lab
Time: Oct 19, 2018 8:00 AM Pacific Time (US and Canada)
Recording of our webinar
Greg Janée in Transition

Greg Janée, lead developer for the EZID service, has recently stepped up to be the Director of the Data Curation Program at the UCSB Library. Greg, who joined CDL’s UC3 team at 50% time 10 years ago, shifted to CDL’s infrastructure team for a few years, and until June 2019 will work again with UC3 (at 25% time) as we transition his knowledge and expertise. During that time he will continue to work remotely in Santa Barbara, together with the new Identifier Product Manager / Research Data Specialist who will start mid-November.
Greg has had a long career with CDL and UCSB. He’s made major contributions to identifiers in digital libraries, especially CDL’s very popular EZID service, and also recently published work on “compact identifiers” and ARKs in the Open. His transition to full-time lead of the UCSB Data Curation Program aligns well with CDL’s efforts in research data management. Also in this role, Greg acts as the liaison between UCSB and their participation with Dash (soon to be Dryad), providing replication storage for UCSB’s collections and data publications.