Skip to main content

(index page)

Support your Data

Building an RDM Maturity Model: Part 4

By John Borghi

Researchers are faced with rapidly evolving expectations about how they should manage and share their data, code, and other research products. These expectations come from a variety of sources, including funding agencies and academic publishers. As part of our effort to help researchers meet these expectations, the UC3 team spent much of last year investigating current practices. We studied how neuroimaging researchers handle their data, examined how researchers use, share, and value software, and conducted interviews and focus groups with researchers across the UC system. All of this has reaffirmed our that perception that researchers and other data stakeholders often think and talk about data in very different ways.

Such differences are central to another project, which we’ve referred to alternately as an RDM maturity model and an RDM guide for researchers. Since its inception, the goal of this project has been to give researchers tools to self assess their data-related practices and access the skills and experience of data service providers within their institutional libraries. Drawing upon tools with convergent aims, including maturity-based frameworks and visualizations like the research data lifecycle, we’ve worked to ensure that our tools are user friendly, free of jargon, and adaptable enough to meet the needs of a range of stakeholders, including different research, service provider, and institutional communities. To this end, we’ve renamed this project yet again to “Support your Data”.

Image showing some of the support structure for the Golden Gate Bridge. This image also nicely encapsulates how many of the practices described in our tools are essential to the research process but are often invisible from view.

What’s in a name?

Because our tools are intended to be accessible to a people with a broad range of perceptions, practices, and priorities, coming up with a name that encompasses complex concepts like “openness” and “reproducibility” proved to be quite difficult. We also wanted to capture the spirit of terms like “capability maturity” and “research data management (RDM)” without referencing them directly. After spending a lot of time trying to come up with something clever, we decided that the name of our tools should describe their function. Since the goal is to support researchers as they manage and share data (in ways potentially influenced by expectations related to openness and reproducibility), why not just use that?

Recent Developments

In addition to thinking through the name, we’ve also refined the content of our tools. The central element, a rubric that allows researchers to quickly benchmark their data-related practices, is shown below. As before, it highlights how the management of research data is an active and iterative process that occurs throughout the different phases of a project. Activities in different phases represented in different rows. Proceeding left to right, a series of declarative statements describe specific activities within each phase in order of how well they are designed to foster access to and use of data in the future.

The “Support your Data” rubric. Each row is complemented by a one page guide intended to help researchers advance their data-related practices.

The four levels “ad hoc”, “one-time”, “active and informative” and “optimized for re-use”, are intended to be descriptive rather than prescriptive.

Each row of the rubric is tied to a one page guide that provides specific information about how to advance practices as desired or required. Development of the content of the guides has proceeded sequentially. During the autumn and winter of 2017, members of the UC3 team met to discuss issues relevant to each phase, reduce the use of jargon, and identify how content could be localized to meet the needs of different research and institutional communities. We are currently working on revising the content based suggestions made during these meetings.

Next Steps

Now that we have scoped out the content, we’ve begun to focus on the design aspect of our tools. Working with CDL’s UX team, we’ve begun to think through the presentation of both the rubric and the guides in physical media and online.

As always, we welcome any and all feedback about content and application of our tools.

Dash: 2017 in Review

The goal for Dash in 2017 was to build out features that would make Dash a desirable place to publish data. While we continue to work with the research community to find incentives to publish data generally, the small team of us working on Dash wanted to take a moment to thank everyone who published data this year.

In 2017 we worked in two week sprint intervals to release 26 features and instances (not including fixes).

In 2018 we have one major focus: integrate into researcher workflows to make publishing data a more common practice.

To do so we will be working with the community to:

Follow along with our Github and Twitter and please get in touch with us if you have ideas or experiences to share for making data publishing a more common practice in the research environment.

Test-driving the Dash read-only API

The Dash Data Publication service is now allowing access to dataset metadata and public files through a read-only API. This API focuses on allowing metadata access through a RESTful API. Documentation is available at https://dash.ucop.edu/api/docs/index.html.

There are a number of ways to test out and access this API such as through programming language libraries or with the Linux curl command. This short tutorial gives examples of accessing the API using Postman software which is an easy GUI way to test out and browse an API and is available for the major desktop operating systems. If you’d like to follow along please download Postman from https://www.getpostman.com/ .

We are looking to receive feedback on the first of our Dash APIs, before we embark on building our submission API. Please get in touch with us with feedback or if you would be interested in setting up an API integration with the Dash service.

Create a Dash Collection in Postman

After you’ve installed Postman we want to open Postman and create a Dash collection to hold the queries against the API.

  1. Open Postman.
  2. Click New > Collection.

3. Enter the collection name and click Create.

Set Up Your First Request

1. Click the folder icon for the collection you just set up.

2. Click Add requests under this collection.

3. Fill in a name for your request, select to put it in the Dash collection you created earlier and click Save to Dash to create.

4. Click on the request you just created in the left bar and then click the headers tab.

5. Enter the following key and value in the header list. Key: Content-Type and Value: application/json. This header ensures that you’ll receive JSON data.

6. Enter the request URL in the box toward the top of the page. Leave the request type on “GET.” Enter https://dash.ucop.edu/api for the URL and click Save.

Try Your Request

1. Test out your request by clicking the Send button.
2. If everything is set up correctly you’ll see results like these.

Information about the API is being returned in JavaScript Object Notation (JSON) and includes a few features to become familiar with.
– A links section in the JSON exposes Hypertext Application Language (HAL) links that can guide you to other parts of the API, much like links in a web page allow you to browse other parts of a site.
– The self link refers to the current request.
– Other links can allow you to get further information to create other requests in the API.
– The curies section leads to some basic documentation that may be used by some software.

Following Links and Viewing Dataset Information

Postman has a nice feature that allows you to follow links in an API to create additional requests.

  1. Try it out by clicking the url path associated with stash:datasets which shows as /api/datasets.

2. You’ll see a new tab open for your new request toward the top of the screen and then you can submit or save the new request.

3. If you send this request you will see a lot of information about datasets in Dash.

Some things to point out about this request:
– The top-level links section contains paging links because this request returns a list of datasets. Not all datasets are returned at once, but if you needed to see more you could go to the next page.

– The list contains a count of items in the current page and a total for all items.
– When you look at the embedded datasets you’ll see additional links for each individual dataset, which you could also follow.

– You can view metadata for the most recent, successfully submitted version of each dataset that shows as dataset information here.

Hopefully this gives a general idea of how the API can be used and now you can create additional requests to browse the Dash API.

Hints for Navigating the Dash Data Model

– As you browse through the different links in the Dash API, it keep the following in mind.
– A dataset may have multiple versions. If it has only been edited or submitted once in the UI it will still have one version.
– The metadata shown at the dataset level is based on the latest published version and the dataset indicates the version number it is using to derive this metadata.
– Each version has descriptive metadata and files associated with it.
– To download files, look for the stash:download links. There are downloads for a dataset, a version and for an individual file. These links are standard HTTP downloads that could be downloaded using a web browser or other HTTP client.
– If you know the DOI of the dataset you wish to view, use a GET request for /api/datasets/<doi>.
– The DOI would be in a format such as doi:10.5072/FKK2K64GZ22 and needs to be URL encoded when included in the URL.
– See for example https://www.w3schools.com/tags/ref_urlencode.asp or https://www.urlencoder.org/ or use the URL encoding methods available in most programming languages.
– For datasets that are currently private for peer review, downloads will not become available until the privacy period has passed.