Skip to main content

Where’s the adoption? Shifting the Focus of Data Publishing in 2018

Daniella Lowenberg,

By Daniella Lowenberg

At RDA10 in Montreal I gave a presentation on Dash in the Repository Platforms for Research Data IG session. The session was focused on backend technology and technology communities for repository platforms. I talked a bit about the Dash open source software and features but walked away thinking “How productive is it to discuss software systems to support research data at length? Is adoption based on technology?”

The answers are: not productive, and no.

Following RDA10, I spent months talking with as many researchers and institutions as possible to figure out how much researchers know about data publishing and what would incentivize them to make it a common practice.

Researchers are the end users of research data publishing platforms and yet they are providing the least amount of input into these systems.

And if you think that is confusing, there is an additional layer of disorder: “researchers” is used as an umbrella term for various levels of scientists and humanists who can have drastically different opinions and values based on discipline and status.

I visited labs and took PIs, grad students, and postdocs to coffee at UCSF, UC Berkeley, and UC Santa Cruz. Coming from a science background and spending time convincing authors to make their data available at PLOS, I thought I had a pretty good sense of incentives, but I needed to span disciplines and leave the mindset of “you have to make your data available, or your paper will not be published” to hear researchers’ honest answers. Here’s what I found:

People like the idea of data publishing in theory, but in practice, motivation is lacking and excuses are prominent.

This is not surprising though. The following is an example scenario (with real quotes) of how data publishing is perceived at various statuses (for some control this scenario takes place within biomedical research)

Grad Student: “Data publishing sounds awesome, I would totally put my data out there when publishing my work but it’s really up to my PI and my PI doesn’t think it is necessary.”

Post Doc: “I like the idea but if we put data in here are people are going to use my data before I can publish 3 Nature papers as first author?”

PI: “I like the idea of having my students put their work in an archive so I can have all research outputs from the lab in one place, but until my Vice Chancellor of Research (VCR) tells me it is a priority I probably won’t use it.”

VCR: “Funder and Publisher mandates aren’t incentivizing enough?”

Publisher: “We really believe the funder mandates are the stick here.”

As you can tell there is not a consensus of understanding and there is a difference between theoretical and practical implementation of data publishing. As one postdoc said at UCSF “If I am putting on my academic hat, of course my motivation is the goodness of it. But, practically speaking I’m not motivated to do anything”. With differing perspectives for each stakeholder there are infinite ways to see how difficult it is to gauge interest in data publishing!

Other reasons adoption of data publishing practices is difficult:

At conferences and within the scholarly communication world, we speak in jargon about sticks (mandates) and carrots (reproducibility, transparency). We are talking to each other: people who have already bought into these incentives and needs and are living in an echo chamber. We forget that these mandates and reasons for open data are not well understood and effective by researchers themselves. Mandates and justifications about being “for the good of science” are not consistently understood across the lab. PIs are applying for grants and writing up Data Management Plans (DMPs), but the grad students and postdocs are doing the data analysis and submitting the paper. There is plenty of space here for miscommunication, misinformation, and difficulty. We also say that reproducibility, transparency, and getting credit for your work are wide ranging carrots, but reproducibility/transparency initiatives vary per field. Getting credit for publishing data is seemingly easy (like articles)- authorship on a dataset and citations of the DOI credit the researchers who first published the data. But, how can we say that right now researchers are “getting credit” for their data publications if citing data isn’t common practice, few publishers support data citations, and tenure committees aren’t looking at the reach of data?

We spend time talking to one another about how open data is a success because publishers have released X many data statements and repositories have X many datasets. Editors and reviewers typically do not check for (or want to check for) data associated with publications to ensure they are underlying or FAIR data, and many high volume repositories take any sort of work (conference talks, pdfs, posters). How many articles have the associated data publicly available and in a usable format? How many depositions to repositories are usable research data? We must take these metrics with a grain of salt and understand that while we are making progress, there are various avenues we must be investing in to make the open data movement a success.

All aspects of this are related to researcher education and lowering the activation energy (i.e. making it a common and accepted practice).

A provocative conversation to bridge people together:

In my presentation at CNI I scrolled through a number of quotes from researchers that I gathered during these coffee talks, and the audience laughed at many of them. The quotes are funny (or sad or realistic or [insert every range of emotion]), but even this reaction is reason for us to re-think our ways of driving adoption of research data management and open data practices. To be talking about technologies and features that aren’t requested by researchers is getting ahead of ourselves.

Right now there should be one focus: finding incentives and ways to integrate into workflows that effectively get researchers to open up and preserve their data.

When presenting this I was apprehensive but confident: I was presenting opinions and experiences but hearing someone say ‘we’re doing it wrong’ usually does not come with applause. What came of the presentation was a 30-minute discussion full of genuine experiences, honest opinions, and advice. Some discussion points that came up:

The general consensus was that we have to re-focus on researcher needs and integrate into researcher workflows. To do this successfully:

So, let’s work together. Let’s talk to as many researchers in as many domains and position levels in 2018. Let’s share these experiences out when we meet at conferences and on social media. And let’s focus on adoption of a practice (data publishing) instead of spotlighting technologies, to make open data a common, feasible, and incentivized success.