At RDA10 in Montreal I gave a presentation on Dash in the Repository Platforms for Research Data IG session. The session was focused on backend technology and technology communities for repository platforms. I talked a bit about the Dash open source software and features but walked away thinking “How productive is it to discuss software systems to support research data at length? Is adoption based on technology?”
The answers are: not productive, and no.
Following RDA10, I spent months talking with as many researchers and institutions as possible to figure out how much researchers know about data publishing and what would incentivize them to make it a common practice.
Researchers are the end users of research data publishing platforms and yet they are providing the least amount of input into these systems.
And if you think that is confusing, there is an additional layer of disorder: “researchers” is used as an umbrella term for various levels of scientists and humanists who can have drastically different opinions and values based on discipline and status.
I visited labs and took PIs, grad students, and postdocs to coffee at UCSF, UC Berkeley, and UC Santa Cruz. Coming from a science background and spending time convincing authors to make their data available at PLOS, I thought I had a pretty good sense of incentives, but I needed to span disciplines and leave the mindset of “you have to make your data available, or your paper will not be published” to hear researchers’ honest answers. Here’s what I found:
People like the idea of data publishing in theory, but in practice, motivation is lacking and excuses are prominent.
This is not surprising though. The following is an example scenario (with real quotes) of how data publishing is perceived at various statuses (for some control this scenario takes place within biomedical research)
Grad Student: “Data publishing sounds awesome, I would totally put my data out there when publishing my work but it’s really up to my PI and my PI doesn’t think it is necessary.”
Post Doc: “I like the idea but if we put data in here are people are going to use my data before I can publish 3 Nature papers as first author?”
PI: “I like the idea of having my students put their work in an archive so I can have all research outputs from the lab in one place, but until my Vice Chancellor of Research (VCR) tells me it is a priority I probably won’t use it.”
VCR: “Funder and Publisher mandates aren’t incentivizing enough?”
Publisher: “We really believe the funder mandates are the stick here.”
As you can tell there is not a consensus of understanding and there is a difference between theoretical and practical implementation of data publishing. As one postdoc said at UCSF “If I am putting on my academic hat, of course my motivation is the goodness of it. But, practically speaking I’m not motivated to do anything”. With differing perspectives for each stakeholder there are infinite ways to see how difficult it is to gauge interest in data publishing!
Other reasons adoption of data publishing practices is difficult:
At conferences and within the scholarly communication world, we speak in jargon about sticks (mandates) and carrots (reproducibility, transparency). We are talking to each other: people who have already bought into these incentives and needs and are living in an echo chamber. We forget that these mandates and reasons for open data are not well understood and effective by researchers themselves. Mandates and justifications about being “for the good of science” are not consistently understood across the lab. PIs are applying for grants and writing up Data Management Plans (DMPs), but the grad students and postdocs are doing the data analysis and submitting the paper. There is plenty of space here for miscommunication, misinformation, and difficulty. We also say that reproducibility, transparency, and getting credit for your work are wide ranging carrots, but reproducibility/transparency initiatives vary per field. Getting credit for publishing data is seemingly easy (like articles)- authorship on a dataset and citations of the DOI credit the researchers who first published the data. But, how can we say that right now researchers are “getting credit” for their data publications if citing data isn’t common practice, few publishers support data citations, and tenure committees aren’t looking at the reach of data?
We spend time talking to one another about how open data is a success because publishers have released X many data statements and repositories have X many datasets. Editors and reviewers typically do not check for (or want to check for) data associated with publications to ensure they are underlying or FAIR data, and many high volume repositories take any sort of work (conference talks, pdfs, posters). How many articles have the associated data publicly available and in a usable format? How many depositions to repositories are usable research data? We must take these metrics with a grain of salt and understand that while we are making progress, there are various avenues we must be investing in to make the open data movement a success.
All aspects of this are related to researcher education and lowering the activation energy (i.e. making it a common and accepted practice).
A provocative conversation to bridge people together:
In my presentation at CNI I scrolled through a number of quotes from researchers that I gathered during these coffee talks, and the audience laughed at many of them. The quotes are funny (or sad or realistic or [insert every range of emotion]), but even this reaction is reason for us to re-think our ways of driving adoption of research data management and open data practices. To be talking about technologies and features that aren’t requested by researchers is getting ahead of ourselves.
Right now there should be one focus: finding incentives and ways to integrate into workflows that effectively get researchers to open up and preserve their data.
When presenting this I was apprehensive but confident: I was presenting opinions and experiences but hearing someone say ‘we’re doing it wrong’ usually does not come with applause. What came of the presentation was a 30-minute discussion full of genuine experiences, honest opinions, and advice. Some discussion points that came up:
- Yale University: “Find the pain” — talking to researchers about not what their dream features are but what would really help them with their data needs
- Elsevier, Institutions: A debate and interest in what is a Supporting Information (SI) file and if SI files are a gateway drug that we support. Note: I and a few others agreed that no, publishing a table already in the article should not be rewarded. That would be positive reinforcement that common practices are good enough
- Duke University: Promoting open and preserved data as a way for PIs to reduce panic when students join and leave the lab and have an archived set of work from past grad students (while they still receive authorship of the dataset)
- Claremont McKenna Colleges: Are incentives and workflows different per institution and institution level or should the focus be on domains/disciplines? Note: Typically researchers do not limit their focus to the institution level but rather are looking at their field so this may be the better place to align (rather than institutional policies and incentives).
The general consensus was that we have to re-focus on researcher needs and integrate into researcher workflows. To do this successfully:
- We need to check our language.
- We need to ensure that our primary drive in this community is to build services and tools that make open data and data management common practices in the research workflows.
- We need to share our experiences and work with all research stakeholders to understand the landscape and needs (and not refer to an unrealistic lifecycle).
So, let’s work together. Let’s talk to as many researchers in as many domains and position levels in 2018. Let’s share these experiences out when we meet at conferences and on social media. And let’s focus on adoption of a practice (data publishing) instead of spotlighting technologies, to make open data a common, feasible, and incentivized success.