(index page)
Community-Owned Data Publishing Infrastructure
As a library community, we continue to struggle to find scalable approaches to offering open, shared, sustainable scholarly infrastructure. This is especially true in the data publishing and research data management space where institution-focused approaches to capturing and curating data may be hindering our ability to grow adoption by our researchers.
To alleviate this impasse and jumpstart a new community-led approach, California Digital Library is formally partnering with Dryad to build a globally-accessible, transparent, and low-cost data publishing and curation service. The goal of this partnership is to completely reimagine the potential for Dryad, acting as an open, free community hub for collecting and curating data for researchers. It is not intended to compete with existing institution-based services, but to complement and amplify each of our campus’ efforts.
We hope that we can start a global discussion with institutions worldwide on better ways to support institutions and researchers in the face of rapid commercialization of the research data space. We cannot do this alone. For our collective action to effectively leverage institutional knowledge and serve researchers as end users, we need a diverse group of institutions to participate in defining the goals and values of this activity.
What does this look like?
We are putting the finishing touches on the migration of the Dryad service onto CDL’s technical platform. Dryad is a trusted name in the researcher community and, with this technical shift, it will be a space where institutional members will have transparent reporting features and the ability to join a global data curation community. Dryad will also be positioned to enhance technical integrations (via API) with publishing partners to seamlessly capture data publications at the time of article publishing. This means that we will be able to simultaneously drive adoption of data publishing and offer digital curation and stewardship in one space.
CDL Awarded IMLS Grant for Community-Owned Data Publishing Infrastructure
Supporting shared scholarly infrastructure must be done by the community and for the community. To help jumpstart this process, California Digital Library and Dryad will facilitating several one-on-one discussions and community workshops in the coming months to determine the features and services most needed in our community.
Our first community workshop will be held in December after CNI in Washington, DC. With funding from an IMLS National Infrastructure grant, we will host a facilitated discussion on institutional values, needs, and potential community-based business models that meet our collective goals, support our researchers, and create a sustainable, attractive new Dryad service offering. Our goal is to chart a path forward for this movement and gain concrete institutional commitments to joining the Dryad community.
How do you get involved?
Please read the latest blog post from Melissanne Scheld, Dryad’s Executive Director, about the next steps for Dryad.
Institutions: If a member of senior leadership would be interested in participating in our one-day workshop on December 12, 2018, please contact the UC Curation Center (UC3) at CDL for more information.
Don’t Worry: There will be additional workshops planned in the US and abroad. We will keep you posted on future opportunities to get involved in this important initiative. Please contact contact the UC Curation Center (UC3) at CDL for more information for more information.
The blog is cross-posted at CDLinfo: https://www.cdlib.org/cdlinfo/2018/10/24/community-owned-data-publishing-infrastructure/
Tackling the storage costs of digital preservation
Over the past year, California Digital Library (CDL) has facilitated a discussion between UC campus Vice Chancellors of Research (VCRs), Chief Information Officers (CIOs), and University Librarians (ULs) to explore pilot ideas for breaking down the high data storage costs associated with digital preservation. Our goal is to work in small, incremental ways towards building a sustainable and reliable network of storage nodes for sharing and preserving research data that does not rely on uncertain funding sources. In addition, we are looking to find ways for campuses to retain copies of their datasets in a financially responsible manner. This exploration is codenamed UCDN (UC Data Network).
UCDN: capital investment into campus storage
During our consultation period, the most popular UCDN pilot idea that materialized was that campuses could break this logjam by making upfront capital investment in storage. The hypothesis is that if pre-established storage nodes can be leveraged for research data preservation then this could remove or reduce the need for recurring charges being sent to cash-strapped departments and this could offer a way for each campus to retain copies of their outputs.
This idea gained traction and, over this past summer, three campuses volunteered to participate in a pilot to explore this idea further: UCSF, UC Irvine, and UC Riverside.
New pilot projects at UCSF, UC Irvine, and UC Riverside
Starting in the fall of 2018, IT teams at UCSF, UC Irvine, and UC Riverside are provisioning storage nodes to support this pilot project. As each campus gets its storage online, campus teams that span research offices, IT teams, and libraries will set up new procedures and/or re-evaluate existing procedures regarding the preservation of research data. They will then use this as an opportunity to re-engage with research projects that benefit from this new investment.
In addition to these campus-based collaborations, CDL will also connect to each new storage node as back-end storage components of our Merritt preservation repository. This will allow us to automatically leverage the new campus investment any time a researcher from one of the pilot campuses uses the Dash data publishing platform for publishing their research data. In addition, any researcher from one of these three campuses who is interested in working on other research data preservation projects can contact their local campus teams or UC3 for more information on utilizing this storage.
How you can get involved?
UCDN is a unique approach to back-end storage and preservation for research data. This new pilot is meant to help with streamlining campus administrative processes and establish more logical resource sharing. Through this, we hope it will also allow for more consistent processes for research data preservation to emerge.
How can you leverage this back end system in your research projects? If you are a researcher or research team lead at UCSF, UCI, or UCR, you can utilize this new resources by continuing to use (or beginning to use):
- Dash manual submissions: your campus offers the Dash data publishing platform for sharing datasets. Any researcher from UCSF, UCI, or UCR can sign-in at any time and submit a dataset to be published. All deposits are assigned a DataCite DOI to streamline citation and simplify the process for connecting your datasets to journal articles during the publishing process. We will leverage the new storage nodes for all manual deposits from these three campuses. You can learn more here:
- UCSF dash: https://datashare.ucsf.edu/
- UC Irvine dash: http://dash.lib.uci.edu/
- UC Riverside dash: https://dash.ucr.edu/
- Dash by API: Dash offers a sophisticated API for submitting datasets directly from other environments i.e. electronic notebooks, code repositories, web scripts, etc. We will leverage the new storage nodes for all API deposits from these three campuses. You can learn more about how to integrate Dash (and digital preservation) by visiting our tech documentation here:
- Technical How-To guide: https://github.com/CDLUC3/stash/blob/master/stash_api/basic_submission.md
- Swagger API documentation: https://dash.ucop.edu/api/docs/index.html
- Additional Projects: Researchers are routinely looking for digital preservation options for their research projects. When this is not available, this can sometimes result in orphaned datasets (those left on a hard drive, Box, or Drive) or orphaned data projects (those left on old lab webpages or old research collaboration pages). We can help move those datasets into Dash for long-term preservation (leveraging the new storage nodes). If you know of data in need of a long-term home, please contact UC3 or your campus data curation team:
A note about Dash: As we announced in May of 2018, CDL formally partnered with Dryad. We are in the process of migrating Dryad onto the Dash platform, at which point Dash will be rebranded as Dryad. This does not change the UCDN storage node pilot. The new Dryad service will continue to utilize localized shared storage for researchers at UCSF, UC Irvine, and UC Riverside.
Tackling the storage costs of preservation
By relying upon upfront capital investment of storage from IT teams rather than direct campus recharge to individual departments or libraries, we hope to remove common administrative and financial barriers to wider campus adoption of research data preservation. At the same time, we hope this will enable a simple way for campuses to retain copies of their research outputs in a financially sustainable way.
- Researchers from UCSF, UCR, UCI: To learn more about how you can leverage these resources during the pilot, please contact UC3 or your campus research office, IT departments, or libraries.
- Researchers from other UC campuses: While your campus is not piloting this approach to back-end storage, there are other preservation projects/services you can leverage. Please contact UC3 or your campus research office, IT departments, or libraries for details/ideas.
Lessons from Dat in the Lab: Webinar
We have received several inquiries about the status of our Dat-in-the-Lab project. To share our project outputs, we held a webinar on Friday, October 19, 2018. We spent the webinar showcasing our work and opening up a dialogue with the community on next steps.
Please learn more about our project and lessons learned by watching the recording of our webinar.
Lessons from Dat in the Lab – Agenda
Friday October 19th, 2018
8 am San Francisco / 11 am New York / 4pm London / 8:30 PM Delhi
- Introduction and overview of the ‘Dat in the Lab’ project
- Anacapa: Archiving and sharing analysis pipelines with Singularity and Dat
- Discussion on containerized workflows and sharing
- Questions and discussion
- What’s next?
How to watch the webinar
Webinar: Dat in the Lab
Time: Oct 19, 2018 8:00 AM Pacific Time (US and Canada)
Recording of our webinar
dat-in-the-lab: Announcing The Dat Anacapa Container
Today we are releasing Anacapa Container, which enables reproducibility of research environment and data across campuses.
If you’ve been following our work over the last year you’ll be aware of the Dat in the Lab project, funded by the Gordon and Betty Moore foundation (read our previous writeups on a lab visit, eDNA, and containerization challenges). As this project comes to a close, we are excited to release this final piece of work. A final project wrap-up will be released later this fall.
The Anacapa Container project has been a collaboration between the Code for Science & Society team and researchers at five different University of California campuses: UCLA, UC Merced, UC Davis, UC Riverside and UC Santa Cruz. Our goal was to take the Anacapa pipeline from UCLA and use a combination of Dat plus containerization technologies to replicate the pipeline across the various University of California research cluster environments.
The Anacapa pipeline itself is a collection of software written in Bash, Python, R and Perl that takes eDNA sequences and performs computationally expensive and complex analysis on the data to do things such as detect which species were in the sample. Anacapa is the core analysis tool for the CALeDNA consortium, and there are a number of collaborating institutions within California that wish to use the pipeline. Additionally, there are now a growing number of research groups world-wide who are interested in re-using the Anacapa pipeline for their own local eDNA research.
Problem: Complex Software Installation
One of the most challenging parts of using any complex scientific pipeline is installing all of the necessary software dependencies to run it. This may not seem challenging at first, but scientific software is usually poorly documented, and rarely tested on research servers beyond the ones at the originating institution. A growing number of researchers now are using modern software development practices, such as writing user friendly documentation and putting their projects on GitHub, but it can still be weeks or months of effort to replicate the dependencies from the originating institution at a new software environment.
In our case, the CALeDNA consortium includes members of 6 universities. This means for a researcher at UC Merced that wants to run Anacapa, they would have to request that the UC Merced Research Cluster install a long list of specific versions of the R, Python, Perl and Shell bioinformatics utilities that UCLA’s Hoffman cluster provides. The UCLA-based authors of Anacapa may have never had to do this for certain software packages, as many of them may have already been installed via requests from other researchers who also use the Hoffman cluster. UC Merced has a different independently maintained research cluster. This means a completely different set of pre-installed packages, and a different Linux distribution. All of this results in a lot of back and forth between researchers and research cluster administrators at both campuses to try and debug the many differences that pop up when trying to replicate the exact environment that is composed of dozens of independent software packages.
When we started working on this project, one particular researcher had already spent two months working on getting the necessary packages installed locally, but had not yet finished them all. We realized we needed a way to simplify the installation of the Anacapa environment so that every new research group could avoid months of setup work.
Anacapa Container
We decided to use the Singularity containerization software as the main dependency for Dat Container. Singularity is an open source software container developed by folks from Lawrence Berkeley National Laboratory, learn more here. It has a security model that works well for university compute cluster users. While looking into the other options, we learned that the approach taken by the popular container software Docker requires sudo access. Unfortunately, most universities are unable to offer Docker containers to researchers as university compute clusters can not offer that to individual users. Singularity, on the other hand, uses a different technique to load the containerized environment without requiring sudo privileges. Docker is much more popular in the general tech industry, but this issue eliminated it as an option for us. Singularity is a young project that is developing quite rapidly and it has worked well for this application.
The Anacapa Container itself is a singularity image that we developed to include all of the software dependencies needed to run the Anacapa Toolkit from UCLA. We have a script called a Containerfile that step-by-step installs each software package into a Ubuntu Linux server operating system disk image file. At the end of the process, a single 2GB disk image file can be distributed. Instead of requiring that the numerous dependencies get installed onto new systems, the only dependency is the Singularity runtime. This simplifies the request a researcher has to maketo their system administrator.
To make it easy for people on other UC campuses to run Anacapa, we have been involved in getting Singularity installed at five UC campuses. Even though Singularity is a relatively straightforward package, we encounted numerous install errors that had to be corrected with tedious back and forth remote technical support with sysadmins. Even with our streamlined approach that only required one new package get installed (singularity), it was still painful at times. This issue is faced by anyone looking to share resources between institutions. We are hoping that we can improve this process for others who wish to share analysis environments. Singularity is now up and running at five UC campuses, and any future projects that use Singularity images as a distribution format will require zero new software package requests, as they already have everything they need.
Dat sits in the Anacapa software container and is used to replicate the details of the original Anacapa compute environment.This means that the as the container is replicated and reused, folks can use Dat to version and share their new versions of the container environment.
To work with the Anacapa Container, users only need to download the Anacapa Container file, run a singularity command, and they are in a shell prompt that has all of the Anacapa software pre-installed. Due to the containerization approach that Singularity takes, the heavy compute resources on the host machines are available as native resources in the container, meaning there is no loss in performance (as is the case with virtualization based approaches like VirtualBox).
Software as data
We often think as data as separate from analysis scripts, software, and compute environments. In reality, these are different types of digital information that can be handled by Dat. By treating the software as data, we can approach preservation, versioning, and sharing differently. We like the simplicity of the single file disk images that Singularity uses, as it fits well with the Dat ethos of sharing your research as one folder that includes your manuscript, your datasets, your papers website, and now your entire research software environment.
This another step towards easy software reproducibility. The image ensures all of the exact software versions required are being used at runtime. The traditional problem of a system wide update of a Python package breaking everyones existing scripts that depended on the old version is no longer an issue. Researchers can simply load the environment they want by grabbing a specific Singularity image.
By representing the software environment as a file that can get archived along with the dataset, we can ensure future researchers can always quickly get up and running in their quest to reproduce or modify the Anacapa pipeline. One of our partners on this project is the California Digital Library, who are a group inside the University of California that (among other things) are developing tools to ensure research datasets can get archived and made accessible forever. The challenge at hand is building a system that can coordinate dataset archiving across the giant distributed system that is the University of California and all of the external research groups that depend on UC data.
We have made this work available through DASH, which is a data repository hosted by California Digital Library. Any UC researcher has access to publish datasets through DASH, and we are hoping Anacapa Container can serve as a model for how to distribute reproducible software as part of the research dataset.
Dat and Singularity
Distributing the research container as a single file means that it can be used in conjunction with Dat as the distribution tool. We have only scratched the surface of the possibilities here, but we are looking forward to more partnerships with data repository providers like California Digital Library and the Internet Archive to build a distributed data archive that includes executable software containers. This would ensure data does not simply go to a data repository never to get used again, as the container would allow for the dataset to become interactive and available instantly.
Crossposted by https://blog.datproject.org/2018/09/18/announcing-the-dat-anacapa-container/
PIDapalooza 2019 – are you ready to rock!?!
Yes, it’s back and – with your support – it’s going to be better than ever! The third annual PIDapalooza open festival of persistent identifiers will take place at the Griffith Conference Centre, Dublin, Ireland on January 23-24, 2019 – and we hope you’ll join us there!
Hosted, once again, by California Digital Library, Crossref, DataCite, and ORCID, PIDapalooza will follow the same format as past events — rapid-fire, interactive, 30-60 minute sessions (presentations, discussions, debates, brainstorms, etc.) presented on three stages — plus main stage attractions, which will be announced shortly. New for this year is an unconference track, as suggested by several attendees last time.
In the meantime, get those creative juices flowing and send us your session PIDeas! What would you like to talk about? Hear about? Learn about? What’s important for your organization and your community and why? What’s working and what’s not? What’s needed and what’s missing? We want to hear from as many PID people as possible! Please use this form to send us your suggestions. The PIDapalooza Festival Committee will review all forms submitted by September 21, 2018 and decide on the lineup by mid-October.
As a reminder, the regular themes are:
- PID myths: Are PIDs better in our minds than in reality? PID stands for Persistent IDentifier, but what does that mean and does such a thing exist?
- PIDs forever – achieving persistence: So many factors affect persistence: mission, oversight, funding, succession, redundancy, governance. Is open infrastructure for scholarly communication the key to achieving persistence?
- PIDs for emerging uses: Long-term identifiers are no longer just for digital objects. We have use cases for people, organizations, vocabulary terms, and more. What additional use cases are you working on?
- Legacy PIDs: There are of thousands of venerable old identifier systems that people want to continue using and bring into the modern data citation ecosystem. How can we manage this effectively?
- Bridging worlds: What would make heterogeneous PID systems ‘interoperate’ optimally? Would standardized metadata and APIs across PID types solve many of the problems, and if so, how would that be achieved? What about standardized link/relation types?
- PIDagogy: It’s a challenge for those who provide PID services and tools to engage the wider community. How do you teach, learn, persuade, discuss, and improve adoption? What’s it mean to build a pedagogy for PIDs?
- PID stories: Which strategies worked? Which strategies failed? Tell us your horror stories! Share your victories!
- Kinds of persistence: What are the frontiers of ‘persistence’? We hear lots about fraud prevention with identifiers for scientific reproducibility, but what about data papers promoting PIDs for long-term access to reliably improving objects (software, pre-prints, datasets) or live data feeds?
We’ll be posting more information on the PIDapalooza website over the coming months, as well as keeping you updated on Twitter (@pidapalooza).
In the meantime, what are you waiting for!? Book your place now — and we also strongly recommend that you book your accommodation early as there are other big conferences in Dublin that week.
PIDapalooza, Dublin, Ireland, January 23-24, 2019 – it’s a date!
Org ID: a recap and a hint of things to come
Over the past couple of years, a group of organizations with a shared purpose—California Digital Library, Crossref, DataCite, and ORCID—invested our time and energy into launching the Org ID initiative, with the goal of defining requirements for an open, community-led organization identifier registry. The goal of our initiative has been to offer a transparent, accessible process that builds a better system for all of our communities. As the working group chair, I wanted to provide an update on this initiative and let you know where our efforts are headed.
Community-led effort
FIrst, I would like to summarize all of the work that has gone into this project, a truly community-driven initiative, over the last two years:
- A series of collaborative workshops were held at the Coalition for Networked Information (CNI) meeting in San Antonio TX (2016), the FORCE11 conference in Portland OR (2016), and at PIDapalooza in Reykjavik (2016).
- Findings from these workshops were summarized in three documents, which we made openly available to the community for public comment:
- A Working Group worked throughout 2017 and voted to approve a set of recommendations and principles for ‘governance’ and ‘product’:
- We then put out a Request for Information that sought expressions of interest from organizations to be involved in implementing and running an organization identifier registry.
- There was a really good response to the RFI; reviewing the responses and thinking about next steps led to our most recent stakeholder meeting in Girona in January 2018, where ORCID, DataCite, and Crossref were tasked with drafting a proposal that meets the Working Group’s requirements for a community-led, organizational identifier registry.
Thank you
I want to take this opportunity to thank everyone who has contributed to this effort so far. We’ve been able to make good progress with the initiative because of the time and expertise many of you have volunteered. We have truly benefited from the support of the community, with representatives from Alfred P. Sloan Foundation; American Physical Society, California Digital Library, Cornell University, Crossref, DataCite, Digital Science, Editeur, Elsevier, Foundation for Earth Sciences, Hindawi, Jisc, ORCID, Ringgold, Springer Nature, The IP Registry, and U.S. Geological Survey involved throughout this initiative. And we couldn’t have done any of it without the help and guidance of our consultants, Helen Szigeti and Kristen Ratan.
The way forward
The recommendations from our initiative have been converted into a concrete plan for building a registry for research organizations. This plan will be posted in the coming weeks.
The initiative’s leadership group has already secured start-up resourcing and is getting ready to announce the launch plan—more details coming soon.
We hope that all stakeholders will continue to support the next phase of our work — look for announcements in the coming weeks about how to get involved.
As always, we welcome your feedback and involvement as this effort continues. Please contact me directly with any questions or comments at john.chodacki@ucop.edu. And thanks again for your help bringing an open organization identifier registry to fruition!
References
Bilder, G., Brown, J., & Demeranville, T. (2016). Organisation identifiers: current provider survey. ORCID. https://doi.org/10.5438/4716
Cruse, P., Haak, L., & Pentz, E. (2016). Organization Identifier Project: A Way Forward. ORCID. https://doi.org/10.5438/2906
Fenner, M., Paglione, L., Demeranville, T., & Bilder, G. (2016). Technical Considerations for an Organization Identifier Registry. https://doi.org/10.5438/7885
Laurel, H., Bilder, G., Brown, C., Cruse, P., Devenport, T., Fenner, M., … Smith, A. (2017). ORG ID WG Product Principles and Recommendations. https://doi.org/10.23640/07243.5402047
Laurel, H., Pentz, E., Cruse, P., & Chodacki, J. (2017). Organization Identifier Project: Request for Information. https://doi.org/10.23640/07243.5458162
Pentz, E., Cruse, P., Laurel, H., & Warner, S. (2017). ORG ID WG Governance Principles and Recommendations. https://doi.org/10.23640/07243.5402002
This was crossposted from the DataCite blog on Aug 2, 2018: https://doi.org/10.5438/67sj-4y05
A Carpentries-Based Approach to Teaching FAIR Data and Software Principles
originally posted by Chris Erdmann
Recently, I was lucky to participate in an innovative workshop held at Technische Informationsbibliothek (TIB) Hannover from 9 – 13 July, 2018, which paired The Carpentries’ pedagogical style of teaching and lesson material with in-depth background on the FAIR Data Principles. FAIR comprises a set of guiding principles to make data findable, accessible, interoperable and reusable (Wilkinson et al., 2016). FAIR is a relatively new initiative that is gaining momentum. Key stakeholders across the research lifecycle are exploring how the underlying FAIR Principles can be applied and assessed at various points. For instance, at the 2018 Research Data Alliance (RDA) Plenary Meeting in Berlin, FAIR was mentioned in at least 23 of the sessions:
Researchers are already starting to ask, How can my research be more FAIR? Thanks to Angelina Kraft (team lead, research data and scientific software) and Katrin Leinweber (research assistant) at TIB Hannover, we have a headstart on developing a training program for the research community on FAIR Data and Software. Angelina and Katrin were joined by Carpentries instructors Konrad Foerstner (ZB MED – Informationszentrum Lebenswissenschaften), Martin Hammitzsch (Das Helmholtz-Zentrum Potsdam Deutsches GeoForschungsZentrum), Luke Johnston (Aarhus University), and Mateusz Kuzak (Dutch Techcentre for Life Sciences) in contributing to the workshop lesson materials and notes. These can be found at the workshop website while the slides can be viewed at the 2018-07-09-FAIR-Data-and-Software-TIB-workshop Google Drive folder (or as PDFs). All the materials are openly shared in the hopes that others will reuse and develop them further. Video recordings will also be available at the TIB AV-Portal. In addition, I and other participants tweeted non-stop to document the workshop for others following along remotely via the hashtag #TIBFDS.
Katrin (left) and Angelina (pictured above) demonstrated that you can successfully pair background information on the FAIR Principles with the hands-on examples taught in The Carpentries. For others hoping to better prepare for FAIR and train their communities in the principles, the TIB Hannover workshop serves as an excellent starting point. I know a number of us in Library Carpentry will be working with Angelina and Katrin to further develop their material.
Crossposted from Library Carpentry blog: https://librarycarpentry.org/blog/2018/07/24/tib-hannover-fair-report/
Internet Archive, Code for Science and Society, and California Digital Library to Partner on a Data Sharing and Preservation Pilot Project
In 2017, CDL joined Code for Science & Society (CSS) on the dat in the lab project. The Moore Foundation-funded project is currently piloting the use of the Dat Protocol in UC labs for data capture, data storage, and data sharing.
As our first year comes to a close, the project team has started looking to expand our pilot to see how Dat could be utilized in the field of preservation. Below is an announcement cross-posted from the Internet Archive blog on June 6, 2018 that describes the next effort from our project team.
Research and cultural heritage institutions are facing increasing costs to provide long-term public access to historically valuable collections of scientific data, born-digital records, and other digital artifacts. With many institutions moving data to cloud services, data sharing and access costs have become more complex.
As leading institutions in decentralization and data preservation, the Internet Archive (IA), Code for Science & Society (CSS) and California Digital Library (CDL) will work together on a proof-of-concept pilot project to demonstrate how decentralized technology could bolster existing institutional infrastructure and provide new tools for efficient data management and preservation. Using the Dat Protocol (developed by CSS), this project aims to test the feasibility of a decentralized network as a new option for organizations to archive and monitor their digital assets.
Dat is already being used by diverse communities, including researchers, developers, and data managers. California Digital Library is building innovative tools for data publication and digital preservation. The Internet Archive is leading efforts to advance the decentralized web community. This joint project will explore the issues that emerge from collecting institutions adopting decentralized technology for storage and preservation activities.
The pilot will feature a defined corpus of open data from CDL’s data sharing service. The project aims to demonstrate how members of a cooperative, decentralized network can leverage shared services to ensure data preservation while reducing storage costs and increasing replication counts.
By working with the Dat Protocol, the pilot will maximize openness, interoperability, and community input. Linking institutions via cooperative, distributed data sharing networks has the potential to achieve efficiencies of scale not possible through centralized or commercial services. The partners intend to openly share the outcomes of this proof-of-concept work to inform further community efforts to build on this potential.
Want to learn more? Representatives of this project will be at FORCE 2018, Joint Conference on Digital Libraries, Open Repositories, DLF Forum, and the Decentralized Web Summit.
————————
More about CSS: Code for Science & Society is a nonprofit organization committed to building public interest technology and low-cost decentralized tools with the Dat Project to help people share and preserve versioned digital information. Read more about CSS’ Dat in the Lab project, our recent Community Call, and other activities. (Contact: Danielle Robinson)
More about IA: The Internet Archive is a non-profit digital library with the mission to provide “universal access to all knowledge.” It works with hundreds of national and international partners providing web, data, and preservation services and maintains an online library comprising millions of freely-accessible books, films, audio, television broadcasts, software, and hundreds of billions of archived websites. https://archive.org/. (Contact: Jefferson Bailey)
For more information from CDL and UC3, contact uc3@ucop.edu, visit https://uc3.cdlib.org, or follow @UC3CDL
A portion of this blog was cross-posted from the Internet Archive blog on June 6, 2018
Job Opening: UC3 Product Manager (EZID) / Research Data Specialist
California Digital Library (CDL) has built a strong portfolio of innovative projects and initiatives concerned with promoting the use of persistent identifiers throughout the scholarly communication ecosystem. Our work has ranged from experimentation and thought leadership to global PID service offerings.
As the home of the ARK standard and the N2T resolver, as well as institutional members of DataCite, Crossref, and ORCID, CDL is dedicated to innovating and sustaining the social and technical infrastructure that enables the open sharing and publication of all legitimate research outputs. This dedication has manifested in our work on data citations, DOIs for data, DOIs for publications, ARKs, ORCiDs, YAMZ, compact identifiers, organizational identifiers, data metrics, PID events, and more.
The centerpiece of these
efforts is EZID, a service that makes persistent identifiers easy. Since it started, EZID has grown into an internationally recognizable brand with partners around the globe, including representatives from academia, government, nonprofit and commercial sectors. Last August, the EZID program began to transition non-UC DOI services. This is done to focus our community’s time and resources back to DataCite and Crossref (as community infrastructure organizations) and free up resources within CDL for new PID projects. We now are looking for ways to leverage this capacity and our expertise in the PID space.
Now Hiring
With the upcoming retirement of Joan Starr, we are looking for an experienced product manager to direct strategic planning, open source development, and community partnerships that will further enhance and extend CDL’s strategic leadership in scholarly PIDs. This is an exciting opportunity to evaluate the potential for innovation, develop a compelling plan for how best to position CDL as an agent for positive change, and truly impact the PID landscape.
Next steps
If you are excited by this opportunity, we hope you will apply. The position reports into the Director of the University of California Curation Center (UC3), CDL’s digital curation program. UC3 is also home to CDL’s systems and initiatives supporting digital preservation, data publishing, research data management, and data skills training for librarians. Additional information on this position is available here: https://jobs.ucop.edu/applicants/Central?quickFind=61143.
New hire: Library Carpentry Community & Development Director
We’re excited to announce that Chris Erdmann has been hired as the Library Carpentry Community and Development Director starting May 4, 2018.
Chris has been working in the libraries for more than 21 years to integrate data management and workflows in database and library systems and has been working with research and library communities through training, consulting and tool development to build programs and empower people to work effectively with data. Chris received his MLIS at the University of Washington iSchool while working at the University’s Technology Transfer Office where he helped automate workflows and develop the unit’s web presence and analytics. He spent roughly ten years working alongside astronomers at the European Southern Observatory (ESO) and Harvard-Smithsonian Center for Astrophysics advancing library data mining and linking services, e.g. ESO Telescope Bibliography.
Also during this time, he led an experimental training series called Data Scientist Training for Librarians geared towards teaching librarians data savvy skills to help transform their library services to meet the needs of their research communities,and he recently joined the Library Carpentry governance group. He’s a co-author with Matt Burton, Liz Lyon, and Bonnie Tijerina on the recent report Shifting to Data Savvy: The Future of Data Science In Libraries, where Library Carpentry and The Carpentries are highlighted as a necessary next step for libraries to advance their research services.
Chris will be working with the Library Carpentry community and The Carpentries to start mapping out the infrastructure for growing the community, formalizing lesson development processes, expanding its pool of instructors, and inspiring more instructor trainers to meet the demand for Library Carpentry workshops around the globe and reach new regions and communities.
While this new position is hosted by the University of California Curation Center (UC3), the digital curation program of the California Digital Library (CDL), it is intended to support the work of the Library Carpentry governance committee on streamlining operations within The Carpentries. The position is funded by IMLS and focused on determining standard curriculum, growing instructor training for librarians and planning for community events like the upcoming Mozilla Sprint on Library Carpentry materials.
We are excited to have Chris on board! Feel free to reach to via Twitter (@libcce), GitHub or LinkedIn.
For more information on Library Carpentry: https://librarycarpentry.github.io & @libcarpentry

