Author: Daniella Lowenberg

Co-Author ORCiDs in Dash

Recently, the Dash team enabled ORCiD login. And while this configuration is important for primary authors, the Dash team feels strongly that all contributors to data publications should get credit for their work.

All co-authors of a published dataset now have the ability to authenticate and attach their ORCiD in Dash.

How this works:

  1. Data are published by a corresponding author who has the ability to authenticate their own ORCiD but they cannot enter other ORCiDs for co-authors. Bearing this in mind, Dash has a space for co-author email addresses to be entered.
  2. If email addresses are entered for co-authors, upon publication of the data, co-authors will receive an email notification. This notification will have a note about ORCiD iDs and a URL that directs to Dash.
  3. Co-authors who have clicked on this URL will be directed to a pop-up box over the dataset landing page which navigates authors to ORCiD for login and authentication
  4. After an ORCiD iD is entered and authenticated, the author is returned to the Dash landing page for their dataset and their ORCiD ID will appear by their name.

 

Dash Enables ORCiD Login

The Dash team has now added a second way to login and submit. In addition to using Single Sign-On, users now have the ability to login with ORCiD. This means that not only can you authenticate with ORCiD, but once you have logged in this way, your ORCiD ID will connect to your Dash account. The next times that you submit to Dash, your ORCiD ID will auto populate in your submission form.

To back-up a little: ORCiD is a persistent identifier used to distinguish researchers from one another, and connect researchers with their research. If you are a researcher and do not currently have an ORCiD, sign up!

To connect your ORCiD:

  1. Login using the button on the far right of the Dash homepage
  2. Here you will see two options. If you click on the top ORCiD button will send you out to the ORCiD authentication page, and after correctly entering your ORCiD info, send you back to Dash.
    Screen Shot 2017-08-17 at 10.04.30 AM
  3. Although you have now successfully authenticated with ORCiD, to ensure you are connected to your correct submitting instance (a campus, a department, DataONE, etc…) you will be asked to choose your Single Sign-On. This is the only time you will be asked to login twice.Screen Shot 2017-08-17 at 10.14.22 AM
  4. After successfully logging in with Single Sign-On you will have your account connected to your ORCiD. In the future, you will not need to repeat this process and instead you will either be able to save your login to your browser or choose one of the two options for logging in.If you have already submitted to Dash before, you may logout, and go through the same steps above. This process will tie your ORCiD to your existing account and allow for either ORCiD or Single Sign-On in the future.

Dash: The Data Publication Tool for Researchers

This post has been crossposted on Medium

We all know that research data should be archived and shared. That’s why Dash was created, a Data Publishing platform free to UC researchers. Dash complies with journal and funder requirements, follows best practices, and is easy to use. In addition, new features are continuously being developed to better integrate with your research workflow.

Why is Dash the best solution for UC researchers:

  • Data are archived indefinitely. You can use Dash to ensure all of your research data will be available even after you get a new computer or switch institutions. Beyond that, your data will have all the important associated documentation on the funding sources for the research, the research methods and equipment used, and readme files on how your data was processed so future researchers from your own lab or globally can utilize your work.
  • Data can be published at any time. While we do have features that assist with affiliated article publication like keeping your data private during the review process, Data Publications do not need to be associated with an article. Publish out your data at any point in time.
  • Data can be versioned. As you update and optimize protocols, or do further analysis on your data, you may update your data files or documentation. Your DOI will always resolve to a landing page listing all versions of the dataset.
  • Data can be uploaded to Dash directly from your computer or through a “manifest”. “Manifest” means you may enter up to 1000 URLs where your data are living on servers, box, dropbox, or google drive and the data will be transferred to Dash without waiting several hours or dealing with timeouts.
  • You can upload up to 100gb of data per submission.
  • Dash does not limit file type. So long as the data are within the size limits listed above, publications can be image data, tabular data, qualitative data, etc…
  • Related works can be linked. Code, articles, other datasets, and protocols can be linked to your data for a more comprehensive package of your research.
  • Data deposited to Dash receive a DOI. This means that not only can your data be located but you can cite your data as you would articles. The landing page for each dataset includes an author list for your citation as well, so each author who contributed to the data collection and analysis may receive credit for their work.
  • Data are assigned an open license. Data deposited are publicly available for re-use to anyone using a Creative Commons license. You put many hours and coffees into producing these data, public release will give your research a broader reach. A light reminder that your name are still associated with data and making your data public does not mean you are “giving away” your work.
  • Dash is a UC project. Dash can be customized per campus. Many campus libraries are subsidizing the cost of storage, and it is developed by University of California Curation Center (UC3) meaning this service is set-up to serve your needs.

We hear a lot about the cost of storage being an inhibitor. But, on many campuses, the storage costs associated with Dash are subsidized by academic libraries or departments. The cost of storage could also be written into grants (as funders do require data to be archived).

We are always looking for feedback on what features would be the most useful, so that we can make data publishing a part of your normal workflows. Get in touch with us or start using Dash to archive and share your data.

Disambiguating Dash and Merritt

What’s Dash? What’s Merritt? What’s the difference? After numerous questions about where things should go and what the differences are between our UC3 services, we got the hint that we are not communicating clearly.

Clearing things up

A group of us sat down and talked through different use cases and what wording we were using that was causing such confusion, and have come up with what we hope is a disambiguation of Dash versus Merritt.

Screen Shot 2017-07-10 at 1.54.06 PM

Different intentions, different target users

While Dash and Merritt interact with each other at a technical level, they have different intentions and users should not be looking at these two services as a comparison. Dash is optimized for researchers and therefore its user interface, user experience, and metadata schema are optimized for use by individual researchers. Merritt is designed for use by institutional librarians, archivists, and curators.

Because of the different intended purposes, features, and users, UC3 does not recommend that Merritt be advertised to researchers on Research Data Management (RDM) sites or researcher-facing Library Guides.

Below are quick descriptions of each service that should clarify intentions and target users:

  • Dash is an open data publication platform for researchers. Self-service depositing of research data through Dash fulfills publisher, funder, and data management plan requirements regarding data sharing and preservation. When researchers publish their datasets through Dash, their datasets are issued a DOI to optimize citability, are publicly available for download and re-use under a CC BY 4.0 or CC-0 license, and are preserved in Merritt, California Digital Library’s preservation repository.  Dash is available to researchers at participating UC campuses, as well as researchers in Environmental and Earth Sciences through the DataONE network.
  • Merritt is a preservation repository for mediated deposits by UC organizations. We work with staff at UC libraries, archives, and departments to preserve digital assets and collections. Merritt offers bit-level preservation and replication with both public or private access. Merritt is also the preservation repository that preserves Dash-deposited data.

The cost of service vs. the cost of storage

California Digital Library does not charge individual users for the Dash or Merritt services. However, we do recharge your institution for the amount of storage used in Merritt (remember, Dash preserves data in Merritt) on an annual basis.  On most campuses, the Library fully subsidizes Dash storage costs, so there is no extra financial obligation to individual researchers depositing data into Dash.

Follow-up

If you have any questions about edge cases or would like to know any more details about the architecture of the Dash platform or Merritt repository, please get in touch at uc3@ucop.edu.

And while you’re here: check out Dash’s new features for uploading large data sets, and uploading directly from the cloud.

Cirrus-ly Convenient Uploading

That was a cloud pun! Following our release two weeks ago, the Dash team is thrilled to present our newest functionality: you may now upload files directly from Box, Dropbox, and Google Drive!

Let’s get you publishing (and citing and getting credit for your data):

  • Using the “upload from server” option, you may enter up to 1000 URLs (and up to 100gb per submission) by pasting in the sharing link from Box, Dropbox, or Google Drive.

Screen Shot 2017-06-20 at 1.40.37 PM[2]

  •  Validate the files and your URLs will appear including the filename and size.

Screen Shot 2017-06-20 at 1.41.25 PM[2].png

  • Submit & download.
    • Box, Dropbox, and Google uploaded files will download the same as they were uploaded to the cloud
    • Google docs, sheets, or presentations will download as Microsoft Office word documents, excel spreadsheets, or powerpoint presentations.

We will be updating our help and FAQ pages this week to reflect our new features, but in the meantime please let us know if you have any questions or feedback.

Library Carpentry Sprint at UC Berkeley

The UC Berkeley Library is participating in the worldwide Library Carpentry Sprint happening on June 1st and 2nd, which is a part of the larger Mozilla Global Sprint 2017. Library Carpentry is a part of the Software Carpentry and Data Carpentry family, and it strives to bring the fundamentals of computing, as well as a … Continue reading →

Source: Library Carpentry Sprint at UC Berkeley

Great talks and fun at csv,conf,v3 and Carpentry Training

Day1 @CSVConference! This is the coolest conf I ever been to #csvconf pic.twitter.com/ao3poXMn81 — Yasmina Anwar (@yasmina_anwar) May 2, 2017 On May 2 – 5 2017, I (Yasmin AlNoamany) was thrilled to attend the csv,conf,v3 2017 conference and the Software/Data Carpentry instructor training in Portland, Oregon, USA. It was a unique experience to attend and speak with many … Continue reading →

Source: Great talks and fun at csv,conf,v3 and Carpentry Training

RDA-DMP movings and shakings

An update on RDA and our Active DMP work, courtesy of Stephanie Simms

RDA Plenary 9 
We had another productive gathering of #ActiveDMPs enthusiasts at the Research Data Alliance (RDA) plenary meeting in Barcelona (5-7 Apr). Just prior to the meeting we finished distilling all of the community’s wonderful ideas for machine-actionable DMP use cases into a white paper that’s now available in RIO Journal. Following on the priorities outlined in the white paper, the RDA Active DMPs Interest Group session focused on establishing working groups to carry things forward. There were 100+ participants packed into the session, both physically and virtually, representing a broad range of stakeholders and national contexts and many volunteered to contribute to five proposed working groups (meeting notes here):
DMP common standards: define a standard for expression of machine-readable and -actionable DMPs
Exposing DMPs: develop use cases, workflows, and guidelines to support the publication of DMPs via journals, repositories, or other routes to making them open
Domain/infrastructure specialization: explore disciplinary tailoring and the collection of specific information needed to support service requests and use of domain infrastructure
Funder liaison: engage with funders, support DMP review ideas, and develop specific use cases for their context
Software management plans: explore the remit of DMPs and inclusion of different output types e.g. software and workflows too

The first two groups are already busy drafting case statements. And just a note about the term “exposing” DMPs: everyone embraced using this term to describe sharing, publishing, depositing, etc. activities that result in DMPs becoming open, searchable, useful documents (also highlighted in a recent report on DMPs from the University of Michigan by Jake Carlson). If you want to get involved, you can subscribe to the RDA Active DMPs Interest Group mailing list and connect with these distributed, international efforts.

Another way to engage is by commenting on recently submitted Horizon2020 DMPs exposed on the European Commission website (unfortunately, the commenting period is closed here and here — but one remains open until 15 May).

DMPRoadmap update
Back at the DMPRoadmap ranch, we’re busy working toward our MVP (development roadmap and other documentation available on the GitHub wiki). The MVP represents the merging of our two tools with some new enhancements (e.g., internationalization) and UX contributions to improve usability (e.g., redesign of the create plan workflow) and accessibility. We’ve been working through fluctuating developer resources and will update/confirm the estimated timelines for migrating to the new system in the coming weeks; current estimates are end of May for DMPonline and end of July for DMPTool. Some excellent news is that Bhavi Vedula, a seasoned contract developer for UC3, is joining the team to facilitate the DMPTool migration and help get us to the finish line. Welcome Bhavi!

In parallel, we’re beginning to model some active DMP pilot projects to inform our work on the new system and define future enhancements. The pilots are also intertwined with the RDA working group activities, with overlapping emphases on institutional and repository use cases. We will begin implementing use cases derived from these pilots post-MVP to test the potential for making DMPs active and actionable. More details forthcoming…

Upcoming events
The next scheduled stop on our traveling roadshow for active DMPs is the RDA Plenary 10 meeting in Montreal (19–21 Sept 2017), where working groups will provide progress updates. We’re also actively coordinating between the RDA Active DMPs IG and the FORCE11 FAIR DMPs group to avoid duplication of effort. So there will likely be active/FAIR/machine-actionable DMP activities at the next FORCE11 meeting in Berlin (25–27 Oct)—stay tuned for details.

And there are plenty of other opportunities to maintain momentum, with upcoming meetings and burgeoning international efforts galore. We’d love to hear from you if you’re planning your own active DMP things and/or discover anything new so we can continue connecting all the dots. To support this effort, we registered a new Twitter handle @ActiveDMPs and encourage the use of the #activeDMPs hashtag.

Until next time
Source: RDA-DMP movings and shakings

On the right track(s) – DCC release draws nigh

blog post by Sarah Jones

Eurostar photo

Eurostar from Flickr by red hand records CC-BY-ND

Preliminary DMPRoadmap out to test

We’ve made a major breakthrough this month, getting a preliminary version of the DMPRoadmap code out to test on DMPonline, DMPTuuli and DMPMelbourne. This has taken longer than expected but there’s a lot to look forward to in the new code. The first major difference users will notice is that the tool is now lightning quick. This is thanks to major refactoring to optimise the code and improve performance and scalability. We have also reworked the plan creation wizard, added multi-lingual support, ORCID authentication for user profiles, on/off switches for guidance, and improved admin controls to allow organisations to upload their own logos and assign admin rights within their institutions. We will run a test period for the next 1-2 weeks and then move this into production for DCC-hosted services.

Work also continues on additional features needed to enable the DMPTool team to migrate to the DMPRoadmap codebase. This includes additional enhancements to existing features, adding a statistics dashboard, email notifications dashboard, enabling a public DMP library, template export, creating plans and templates from existing ones, and flagging “test” plans (see the Roadmap to MVP on the wiki to track our progress). We anticipate this work will be finished in August and the DMPTool will migrate over the summer. When we issue the full release we’ll also provide a migration path and documentation so those running instances of DMPonline can join us in the DMPRoadmap collaboration.

Machine-actionable DMPs

Stephanie and Sarah are also continuing to gather requirements for machine-actionable DMPs. Sarah ran a DMP workshop in Milan last month where we considered what tools and systems need to connect with DMPs in an institutional context, and Stephanie has been working with Purdue University and UCSD to map out the institutional landscape. The goal is to produce maps/diagrams for two specific institutions and extend the exercise to others to capture more details about practices, workflows, and systems. All the slides and exercise from the DMP workshop in Milan are on the Zenodo RDM community collection, and we’ll be sharing a write-up of our institutional mapping in due course. I’m keen to replicate the exercise Stephanie has been doing with some UK unis, so if you want to get involved, drop me a line. We have also been discussing potential pilot projects with the NSF and Wellcome Trust, and have seen the DMP standards and publishing working groups proposed at the last RDA plenary host their initial calls. Case statements will be out for comment soon – stay tuned for more!

We have also been discussing DMP services with the University of Queensland in Australia who are doing some great work in this area, and will be speaking with BioSharing later this month about connecting up so we can start to trial some of our machine-actionable DMP plans.

The travelling roadshow

Our extended network has also been helping us to disseminate DMPRoadmap news. Sophie Hou of NCAR (National Center for Atmospheric Research) took our DMP poster to the USGS Community for Data Integration meeting (Denver, CO 16–19 May) and Sherry Lake will display it next at the Dataverse community meeting (Cambridge, MA 14-16 June). We’re starting an inclusive sisterhood of the travelling maDMPs poster. Display the poster, take a picture, and go into the Hall of Fame! Robin Rice and Josh Finnell have also been part of the street team taking flyers to various conferences on our behalf. If you would like a publicity pack, Stephanie will send out stateside and Sarah will share through the UK and Europe. Just email us your contact details and we’ll send you materials. The next events we’ll be at are the Jisc Research Data Network in York, the EUDAT and CODATA summer schools, the DataONE Users Group and Earth Science Information Partners meetings (Bloomington, IN), the American Library Association Annual Conference (Chicago, IL), and the Ecological Society of America meeting (Portland, OR) . Catch up with us there!

Source: On the right track(s) – DCC release draws nigh

Manifesting Large and Bulk File Data Publications– Now A Reality!

The Dash team is excited to announce our June feature release: Large and Bulk File upload. Taking into consideration the need for large size and file numbers of datasets, as well as the practicality of server timeouts, we have developed a new feature that allows for up to 1,000 files or 100gb* of data to be published per DOI.

To accomplish this we are using a “manifest” workflow- which means that instead of uploading data directly from your computer, you may enter URLS for where your data are located (on a server or public site) for upload. Once uploaded, Dash will display the data in the same manner as direct upload. To reflect this new option for upload we have updated the Upload page to choose between uploading locally (from your computer) or via a server. Information about file size limits (2gb/file, 10gb total local or 1000 files any size up to 100gb*) are listed on this landing page.

Step 1: Enter URLs where data are located

Screen Shot 2017-06-07 at 1.01.59 PM

Step 2: Validated files will appear in Uploaded Files table with any other data files associated from current or former versions

Screen Shot 2017-06-07 at 1.02.19 PM

The benefit of using this workflow is that as a user you do not have to watch your screen for many hours as the data upload and instead your data will be uploaded in the back-end, without the involvement of your computer. This upload mechanism is also not limited to large file use- it can be an easy way to transfer your data directly from a server regardless of size.

A complication with this process is that you cannot upload local data and server-hosted data in the same version. Though this seems tricky- we would like to remind you that Dash supports versioning and after successful publication of the server uploaded data you could go back in and add local files (or vice versa).

While at the moment we do not allow for upload from Gdrive, Box, or Dropbox, we are investigating the sharing links necessary for integrating uploads from the cloud. If you have any feedback to make this feature, or any features more accessible or valuable for researchers please do get in touch. Happy Data Publishing!

Note: To utilize this feature and publish your datasets, your data will need to be hosted on a server. Many institutions, departments, and labs have servers used to host data and information (good examples across the UC campuses, MIT, University of Iowa, etc…). If you have any questions about servers on your campus or external resources, please utilize your campus librarians

*Size limits vary per institutional tenant- please check in with your UC Data Librarians if you have any questions