Category: Digital Curation

Posts from the wider world of digital curation.

Dat-in-the-Lab: Announcing UC3 research collaboration

We are excited to announce that the Gordon and Betty Moore Foundation has awarded a research grant to the California Digital Library and Code for Science & Society (CSS) for the Dat-in-the-Lab project to develop practical new techniques for effective data management in the academic research environment.

Dat-in-the-Lab

The project will pilot the use of CSS’s Dat system to streamline data preservation, publication, sharing, and reuse in two UC research laboratories: the Evolution: Ecology, Environment lab at UC Merced, focused on basic ecological and evolutionary research under the direction of Michael Dawson; and the Center for Watershed Sciences at UC Davis, dedicated to the interdisciplinary study of water challenges.  UC researchers are increasingly faced with demands for proactive and sustainable management of their research data with respect to funder mandates, publication requirements, institutional policies, and evolving norms of scholarly best practice.  With the support of the UC Davis and UC Merced Libraries, the project team will conduct a series of site visits to the two UC labs in order to create, deploy, evaluate, and refactor Dat-based data management solutions built for real-world data collection and management contexts, along with outreach and training materials that can be repurposed for wider UC or non-UC use.  

What is Dat?

The Dat system enables effective research data management (RDM) through continuous data versioning, efficient distribution and synchronization, and verified replication.  Dat lets researchers continue to work with the familiar paradigm of file folders and directories yet still have access to rich, robust, and cryptographically-secure peer-to-peer networking functions.   You can think of Dat as doing for data what Git has done for distributed source code control.  Details of how the system works are explained in the Dat whitepaper.

Project partners

Dat-in-the-Lab is the latest expression of CDL’s longstanding interest in supporting RDM at the University of California, and is complementary to other initiatives such as the DMPTool for data management planning, the Dash data publication service, and active collaboration with local campus-based RDM efforts.  CSS is a non-profit organization committed to improving access to research data for the public good, and works at the intersection of technology with science, journalism, and government to promote openness, transparency, and collaboration.  Dat-in-the-Lab activities will be coordinated by Max Ogden, CSS founder and director; Danielle Robinson, CSS scientific and partnerships director; and Stephen Abrams, associate director of the CDL’s UC Curation Center (UC3).

Learn more

Stay tuned for monthly updates on the project. You can bookmark Dat-in-the-Lab on GitHub for access to code, curricula, and other project outputs.  Also follow along as the project evolves on our roadmapchat with the project team, and keep up to date through the project Twitter feed.  For more information about UC3, contact us at uc3@ucop.edu and follow us on Twitter.

PIDapalooza is back!

PIDapalooza is back, by popular demand!  We’re building on the the best of the inaugural PIDapalooza and organizing two days packed with discussions, demos, informal and interactive sessions, updates, talks by leading PID innovators, and more. There will be lots of opportunities to network – and to learn from and engage with PID enthusiasts from around the world.All in a fun, relaxed, and welcoming atmosphere!

We’re looking for your PIDeas! Want to update the community on your current PID projects? Brainstorm new ones? Bring together experts with different perspectives on PID-related topics? Find out what’s new in PID-land? Share your experiences of creating, innovating, or communicating about PIDs? We welcome your proposals for energetic,exciting, and thoughtful rapid-fire sessions related to our eight festival themes :

  1. PID myths.  Are PIDs a dream or reality?  PID stands for Persistent IDentifier, but what does that mean and does such a thing exist?
  2. Achieving persistence.  So many factors affect persistence: resolvability, mission, oversight, funding, succession, redundancy, governance.  Is open infrastructure for scholarly communication the key to achieving persistence?
  3. PIDs for emerging uses.  Long-term identifiers are no longer just for digital objects.  PIDs are used for people, organizations, resources, vocabulary terms, and more. What are you identifying?
  4. Legacy PIDs.  There are of thousands of venerable identifier systems that people want to bring into the modern research information ecosystem.  How can we manage this effectively?
  5. Bridging worlds.  What would optimize the interoperation of  PID systemsy?  Would standardized metadata and APIs across PID types solve many of the problems, and if so, how would that be achieved?  What about standardized link/relation types?
  6. PIDagogy.  It’s a challenge for those who provide PID services and tools to engage the wider community. How do you teach, learn, persuade, discuss, and improve adoption? What’s it mean to build a pedagogy for PIDs?
  7. PID stories.  Which strategies work?  Which strategies fail?  Tell us your horror stories! Share your victories!
  8. Kinds of persistence.  What are the frontiers of ‘persistence’? We hear lots about rigor and reproducibility, but what about data papers promoting PIDs for long-term access to objects that change over time, like software or live data feeds?

Please use this short form to tell us about your proposed session. The program committee will review all suggestions received by and we’ll let you know whether you’ve been successful by the first week of October.

We’ll be posting more information about the festival lineup on the PIDapalooza website and on Twitter (@PIDapalooza) in the coming weeks. We hope to see you in January!

PIDapalooza – the details

Where: Auditori Palau de Congressos de Girona, Passeig de la Devesa, 35, Girona, Catalonia, Spain
When: 23rd and 24th January 2018
Deadline for proposals: September 18 – please use this short form to submit session(s)

RDA-DMP movings and shakings

An update on RDA and our Active DMP work, courtesy of Stephanie Simms

RDA Plenary 9 
We had another productive gathering of #ActiveDMPs enthusiasts at the Research Data Alliance (RDA) plenary meeting in Barcelona (5-7 Apr). Just prior to the meeting we finished distilling all of the community’s wonderful ideas for machine-actionable DMP use cases into a white paper that’s now available in RIO Journal. Following on the priorities outlined in the white paper, the RDA Active DMPs Interest Group session focused on establishing working groups to carry things forward. There were 100+ participants packed into the session, both physically and virtually, representing a broad range of stakeholders and national contexts and many volunteered to contribute to five proposed working groups (meeting notes here):
DMP common standards: define a standard for expression of machine-readable and -actionable DMPs
Exposing DMPs: develop use cases, workflows, and guidelines to support the publication of DMPs via journals, repositories, or other routes to making them open
Domain/infrastructure specialization: explore disciplinary tailoring and the collection of specific information needed to support service requests and use of domain infrastructure
Funder liaison: engage with funders, support DMP review ideas, and develop specific use cases for their context
Software management plans: explore the remit of DMPs and inclusion of different output types e.g. software and workflows too

The first two groups are already busy drafting case statements. And just a note about the term “exposing” DMPs: everyone embraced using this term to describe sharing, publishing, depositing, etc. activities that result in DMPs becoming open, searchable, useful documents (also highlighted in a recent report on DMPs from the University of Michigan by Jake Carlson). If you want to get involved, you can subscribe to the RDA Active DMPs Interest Group mailing list and connect with these distributed, international efforts.

Another way to engage is by commenting on recently submitted Horizon2020 DMPs exposed on the European Commission website (unfortunately, the commenting period is closed here and here — but one remains open until 15 May).

DMPRoadmap update
Back at the DMPRoadmap ranch, we’re busy working toward our MVP (development roadmap and other documentation available on the GitHub wiki). The MVP represents the merging of our two tools with some new enhancements (e.g., internationalization) and UX contributions to improve usability (e.g., redesign of the create plan workflow) and accessibility. We’ve been working through fluctuating developer resources and will update/confirm the estimated timelines for migrating to the new system in the coming weeks; current estimates are end of May for DMPonline and end of July for DMPTool. Some excellent news is that Bhavi Vedula, a seasoned contract developer for UC3, is joining the team to facilitate the DMPTool migration and help get us to the finish line. Welcome Bhavi!

In parallel, we’re beginning to model some active DMP pilot projects to inform our work on the new system and define future enhancements. The pilots are also intertwined with the RDA working group activities, with overlapping emphases on institutional and repository use cases. We will begin implementing use cases derived from these pilots post-MVP to test the potential for making DMPs active and actionable. More details forthcoming…

Upcoming events
The next scheduled stop on our traveling roadshow for active DMPs is the RDA Plenary 10 meeting in Montreal (19–21 Sept 2017), where working groups will provide progress updates. We’re also actively coordinating between the RDA Active DMPs IG and the FORCE11 FAIR DMPs group to avoid duplication of effort. So there will likely be active/FAIR/machine-actionable DMP activities at the next FORCE11 meeting in Berlin (25–27 Oct)—stay tuned for details.

And there are plenty of other opportunities to maintain momentum, with upcoming meetings and burgeoning international efforts galore. We’d love to hear from you if you’re planning your own active DMP things and/or discover anything new so we can continue connecting all the dots. To support this effort, we registered a new Twitter handle @ActiveDMPs and encourage the use of the #activeDMPs hashtag.

Until next time
Source: RDA-DMP movings and shakings

On the right track(s) – DCC release draws nigh

blog post by Sarah Jones

Eurostar photo

Eurostar from Flickr by red hand records CC-BY-ND

Preliminary DMPRoadmap out to test

We’ve made a major breakthrough this month, getting a preliminary version of the DMPRoadmap code out to test on DMPonline, DMPTuuli and DMPMelbourne. This has taken longer than expected but there’s a lot to look forward to in the new code. The first major difference users will notice is that the tool is now lightning quick. This is thanks to major refactoring to optimise the code and improve performance and scalability. We have also reworked the plan creation wizard, added multi-lingual support, ORCID authentication for user profiles, on/off switches for guidance, and improved admin controls to allow organisations to upload their own logos and assign admin rights within their institutions. We will run a test period for the next 1-2 weeks and then move this into production for DCC-hosted services.

Work also continues on additional features needed to enable the DMPTool team to migrate to the DMPRoadmap codebase. This includes additional enhancements to existing features, adding a statistics dashboard, email notifications dashboard, enabling a public DMP library, template export, creating plans and templates from existing ones, and flagging “test” plans (see the Roadmap to MVP on the wiki to track our progress). We anticipate this work will be finished in August and the DMPTool will migrate over the summer. When we issue the full release we’ll also provide a migration path and documentation so those running instances of DMPonline can join us in the DMPRoadmap collaboration.

Machine-actionable DMPs

Stephanie and Sarah are also continuing to gather requirements for machine-actionable DMPs. Sarah ran a DMP workshop in Milan last month where we considered what tools and systems need to connect with DMPs in an institutional context, and Stephanie has been working with Purdue University and UCSD to map out the institutional landscape. The goal is to produce maps/diagrams for two specific institutions and extend the exercise to others to capture more details about practices, workflows, and systems. All the slides and exercise from the DMP workshop in Milan are on the Zenodo RDM community collection, and we’ll be sharing a write-up of our institutional mapping in due course. I’m keen to replicate the exercise Stephanie has been doing with some UK unis, so if you want to get involved, drop me a line. We have also been discussing potential pilot projects with the NSF and Wellcome Trust, and have seen the DMP standards and publishing working groups proposed at the last RDA plenary host their initial calls. Case statements will be out for comment soon – stay tuned for more!

We have also been discussing DMP services with the University of Queensland in Australia who are doing some great work in this area, and will be speaking with BioSharing later this month about connecting up so we can start to trial some of our machine-actionable DMP plans.

The travelling roadshow

Our extended network has also been helping us to disseminate DMPRoadmap news. Sophie Hou of NCAR (National Center for Atmospheric Research) took our DMP poster to the USGS Community for Data Integration meeting (Denver, CO 16–19 May) and Sherry Lake will display it next at the Dataverse community meeting (Cambridge, MA 14-16 June). We’re starting an inclusive sisterhood of the travelling maDMPs poster. Display the poster, take a picture, and go into the Hall of Fame! Robin Rice and Josh Finnell have also been part of the street team taking flyers to various conferences on our behalf. If you would like a publicity pack, Stephanie will send out stateside and Sarah will share through the UK and Europe. Just email us your contact details and we’ll send you materials. The next events we’ll be at are the Jisc Research Data Network in York, the EUDAT and CODATA summer schools, the DataONE Users Group and Earth Science Information Partners meetings (Bloomington, IN), the American Library Association Annual Conference (Chicago, IL), and the Ecological Society of America meeting (Portland, OR) . Catch up with us there!

Source: On the right track(s) – DCC release draws nigh