Skip to main content

Thoughts on Data Publication

CDL UC3,

If you read last week’s post on the IDCC meeting in Amsterdam, you may know that today’s post was inspired by a post-conference workshop on Data Publication, sponsored by the PREPARDE group. The workshop was “Data publishing, peer review and repository accreditation: everyone a winner?” (to access the workshop agenda, goals, and slides, go to the conference workshop website and scroll down to Workshop 6).

Basically the workshop focused on all things data publication, and incited lively discussion among those in attendance. Check out the workshop’s Twitter backchannel via this Storify by Sarah Callaghan of STFC.  My previous blog post about data publication sums it up like this:

The concept of data publication is rather simple in theory: rather than relying on journal articles alone for scholarly communication, let’s publish data sets as “first class citizens”.  Data sets have inherent value that makes them standalone scholarly objects— they are more likely to be discovered by researchers in other domains and working on other questions if they are not associated with a specific journal and all of the baggage that entails.

Stealing shamelessly from Sarah’s presentation, I’m providing a brief overview of issues surrounding data publication for those not well-versed:

First, the benefits of data publication:

We had an internal meeting here at CDL yesterday about data publication. After running through this list of benefits for those in attendance, one of my colleagues asked the question: “Does listing these benefits work? Do researchers want to publish their data?” I didn’t hesitate to answer “No”.

Why not? The biggest reason is a lack of time. Preparing data for sharing and publication is laborious, and overstretched researchers aren’t motivated by these benefits given the current incentive structures in research (papers, papers, papers. And citation of those papers.). Of course, I think this is changing in the very near future. Check out my post on data sharing mandates in the works. So let’s go with the assumption that researchers want to publish. How do they go about this?

Methods for “publishing” data:

Wondering what a “data article” is? Let’s look to Sarah again:

A data article describes a dataset, giving details of its collection, processing, software, file formats, et cetera, without the requirement of  novel analyses or ground-breaking conclusions.

That is, it’s a standalone product of research that can be cited as such. There is much debate surrounding such data articles. Among the issues are:

There are no easy answers to these questions, but I love hearing the debate. I’m optimistic that the forthcoming person we hire as a data publication postdoc will have some great ideas to contribute. Stay tuned!

Amsterdam! CC-BY license, C. Strasser
Amsterdam! CC-BY license, C. Strasser