Skip to main content

(index page)

Community-Owned Data Publishing Infrastructure

As a library community, we continue to struggle to find scalable approaches to offering open, shared, sustainable scholarly infrastructure. This is especially true in the data publishing and research data management space where institution-focused approaches to capturing and curating data may be hindering our ability to grow adoption by our researchers.

To alleviate this impasse and jumpstart a new community-led approach, California Digital Library is formally partnering with Dryad to build a globally-accessible, transparent, and low-cost data publishing and curation service. The goal of this partnership is to completely reimagine the potential for Dryad, acting as an open, free community hub for collecting and curating data for researchers. It is not intended to compete with existing institution-based services, but to complement and amplify each of our campus’ efforts.

We hope that we can start a global discussion with institutions worldwide on better ways to support institutions and researchers in the face of rapid commercialization of the research data space. We cannot do this alone. For our collective action to effectively leverage institutional knowledge and serve researchers as end users, we need a diverse group of institutions to participate in defining the goals and values of this activity.  

What does this look like?

We are putting the finishing touches on the migration of the Dryad service onto CDL’s technical platform.  Dryad is a trusted name in the researcher community and, with this technical shift, it will be a space where institutional members will have transparent reporting features and the ability to join a global data curation community. Dryad will also be positioned to enhance technical integrations (via API) with publishing partners to seamlessly capture data publications at the time of article publishing. This means that we will be able to simultaneously drive adoption of data publishing and offer digital curation and stewardship in one space.

CDL Awarded IMLS Grant for Community-Owned Data Publishing Infrastructure

Supporting shared scholarly infrastructure must be done by the community and for the community.  To help jumpstart this process, California Digital Library and Dryad will facilitating several one-on-one discussions and community workshops in the coming months to determine the features and services most needed in our community.

Our first community workshop will be held in December after CNI in Washington, DC. With funding from an IMLS National Infrastructure grant, we will host a facilitated discussion on institutional values, needs, and potential community-based business models that meet our collective goals, support our researchers, and create a sustainable, attractive new Dryad service offering. Our goal is to chart a path forward for this movement and gain concrete institutional commitments to joining the Dryad community.

How do you get involved?

Please read the latest blog post from Melissanne Scheld, Dryad’s Executive Director, about the next steps for Dryad.

Institutions: If a member of senior leadership would be interested in participating in our one-day workshop on December 12, 2018, please contact the UC Curation Center (UC3) at CDL for more information.

Don’t Worry: There will be additional workshops planned in the US and abroad. We will keep you posted on future opportunities to get involved in this important initiative. Please contact contact the UC Curation Center (UC3) at CDL for more information for more information.

 

The blog is cross-posted at CDLinfo: https://www.cdlib.org/cdlinfo/2018/10/24/community-owned-data-publishing-infrastructure/ 

Tackling the storage costs of digital preservation

Over the past year, California Digital Library (CDL) has facilitated a discussion between UC campus Vice Chancellors of Research (VCRs), Chief Information Officers (CIOs), and University Librarians (ULs) to explore pilot ideas for breaking down the high data storage costs associated with digital preservation. Our goal is to work in small, incremental ways towards building a sustainable and reliable network of storage nodes for sharing and preserving research data that does not rely on uncertain funding sources. In addition, we are looking to find ways for campuses to retain copies of their datasets in a financially responsible manner.  This exploration is codenamed UCDN (UC Data Network).

UCDN: capital investment into campus storage

During our consultation period, the most popular UCDN pilot idea that materialized was that campuses could break this logjam by making upfront capital investment in storage. The hypothesis is that if pre-established storage nodes can be leveraged for research data preservation then this could remove or reduce the need for recurring charges being sent to cash-strapped departments and this could offer a way for each campus to retain copies of their outputs.

This idea gained traction and, over this past summer, three campuses volunteered to participate in a pilot to explore this idea further: UCSF, UC Irvine, and UC Riverside.

New pilot projects at UCSF, UC Irvine, and UC Riverside

Starting in the fall of 2018, IT teams at UCSF, UC Irvine, and UC Riverside are provisioning storage nodes to support this pilot project. As each campus gets its storage online, campus teams that span research offices, IT teams, and libraries will set up new procedures and/or re-evaluate existing procedures regarding the preservation of research data. They will then use this as an opportunity to re-engage with research projects that benefit from this new investment.

In addition to these campus-based collaborations, CDL will also connect to each new storage node as back-end storage components of our Merritt preservation repository. This will allow us to automatically leverage the new campus investment any time a researcher from one of the pilot campuses uses the Dash data publishing platform for publishing their research data. In addition, any researcher from one of these three campuses who is interested in working on other research data preservation projects can contact their local campus teams or UC3 for more information on utilizing this storage.

How you can get involved?

UCDN is a unique approach to back-end storage and preservation for research data. This new pilot is meant to help with streamlining campus administrative processes and establish more logical resource sharing. Through this, we hope it will also allow for more consistent processes for research data preservation to emerge.

How can you leverage this back end system in your research projects? If you are a researcher or research team lead at UCSF, UCI, or UCR, you can utilize this new resources by continuing to use (or beginning to use):

A note about Dash: As we announced in May of 2018, CDL formally partnered with Dryad. We are in the process of migrating Dryad onto the Dash platform, at which point Dash will be rebranded as Dryad. This does not change the UCDN storage node pilot. The new Dryad service will continue to utilize localized shared storage for researchers at UCSF, UC Irvine, and UC Riverside.

Tackling the storage costs of preservation

By relying upon upfront capital investment of storage from IT teams rather than direct campus recharge to individual departments or libraries, we hope to remove common administrative and financial barriers to wider campus adoption of research data preservation. At the same time, we hope this will enable a simple way for campuses to retain copies of their research outputs in a financially sustainable way.  

Lessons from Dat in the Lab: Webinar

We have received several inquiries about the status of our Dat-in-the-Lab project.  To share our project outputs, we held a webinar on Friday, October 19, 2018.  We spent the webinar showcasing our work and opening up a dialogue with the community on next steps.

As a reminder, the Dat-in-the-Lab project was funded by Gordon and Betty Moore Foundation and brought together researchers from the Center for Watershed Sciences (UC Davis), The Dawson Lab (UC Merced), UC Conservation Genomic Consortium (UCLA), Internet Archive, San Diego Supercomputing Center (SDSC), California Digital Library, the Dat Project, and Code for Science & Society.

Please learn more about our project and lessons learned by watching the recording of our webinar.

Lessons from Dat in the Lab – Agenda

Friday October 19th, 2018
8 am San Francisco / 11 am New York / 4pm London / 8:30 PM Delhi

How to watch the webinar

Webinar: Dat in the Lab
Time: Oct 19, 2018 8:00 AM Pacific Time (US and Canada)
Recording of our webinar

Greg Janée in Transition

 

Greg Janée, lead developer for the EZID service, has recently stepped up to be the Director of the Data Curation Program at the UCSB Library. Greg, who joined CDL’s UC3 team at 50% time 10 years ago, shifted to CDL’s infrastructure team for a few years, and until June 2019 will work again with UC3 (at 25% time) as we transition his knowledge and expertise. During that time he will continue to work remotely in Santa Barbara, together with the new Identifier Product Manager / Research Data Specialist who will start mid-November.

Greg has had a long career with CDL and UCSB. He’s made major contributions to identifiers in digital libraries, especially CDL’s very popular EZID service, and also recently published work on “compact identifiers” and ARKs in the Open. His transition to full-time lead of the UCSB Data Curation Program aligns well with CDL’s efforts in research data management. Also in this role, Greg acts as the liaison between UCSB and their participation with Dash (soon to be Dryad), providing replication storage for UCSB’s collections and data publications.