(index page)

Collecting Journal Data Policies: JoRD

My last two posts have related to IDCC 2013; that makes this post three in a row. Apparently IDCC is a gift that just keeps giving (albeit a rather short post in this case).

Today the topic is the JoRD project, funded by JISC. JoRD stands for Journal Research Data; the JoRD Policy Bank is basically a project to collect and summarize data policies for a range of academic journals.

From the JISC project website, this project aims to

provide researchers, managers of research data and other stakeholders with an easy source of reference to understand and comply with Research Data policies.

How to go about this? The project’s objectives (cribbed and edited from the project site):

Identify and consult with stakeholders; develop stakeholder requirements
Investigate the current state of data sharing policies within journals
Deliver recommendations on a central service to summarize journal research data policies and provide a reference for guidance and information on journal policies.

I’m most interested in #2: what are journals saying about data sharing? To tackle this, project members are collecting information about data sharing policies on the the top 100 and bottom 100 Science Journals, and the top 100 and bottom 100 Social Science Journals. Based on the stated journal policies about data sharing, they fill out an extensive spreadsheet. I’m anxious to see the final outcome of this data collection – my hunch is that most journals “encourage” or “recommend” data sharing, but do not mandate it.

I think of the JoRD Policy Bank as having two major benefits:

Educating Researchers. As you may be aware, many researchers are a bit slow to jump on the data sharing bandwagon. This is the case despite the fact that all signs point to future requirements for sharing at the time of publication (see my post about it, Thanks in Advance for Sharing Your Data). Once researchers come to terms with the fact that soon data sharing will not be optional, they will need to know how to comply. Enter JoRD Policy Bank!

Encouraging Publishers. The focus on stakeholder needs and requirements suggests that the outcomes of this project will provide guidance to publishers about how to proceed in their requirements surrounding data sharing. There might be a bit of peer pressure, as well: Journals don’t want to seem behind the times when it comes to data sharing, lest their credibility be threatened.

In general, the JoRD website is chock full of information about data sharing policies, open data, and data citation. Check it out!

C'mon researchers! Jump on the data sharing band wagon! From purlem.com — C’mon researchers! Jump on the data sharing band wagon! From purlem.com

Thoughts on Data Publication

If you read last week’s post on the IDCC meeting in Amsterdam, you may know that today’s post was inspired by a post-conference workshop on Data Publication, sponsored by the PREPARDE group. The workshop was “Data publishing, peer review and repository accreditation: everyone a winner?” (to access the workshop agenda, goals, and slides, go to the conference workshop website and scroll down to Workshop 6).

Basically the workshop focused on all things data publication, and incited lively discussion among those in attendance. Check out the workshop’s Twitter backchannel via this Storify by Sarah Callaghan of STFC. My previous blog post about data publication sums it up like this:

The concept of data publication is rather simple in theory: rather than relying on journal articles alone for scholarly communication, let’s publish data sets as “first class citizens”. Data sets have inherent value that makes them standalone scholarly objects— they are more likely to be discovered by researchers in other domains and working on other questions if they are not associated with a specific journal and all of the baggage that entails.

Stealing shamelessly from Sarah’s presentation, I’m providing a brief overview of issues surrounding data publication for those not well-versed:

First, the benefits of data publication:

Allows credit to data producers and curators (via data citation and emerging altmetrics)
Encourages reuse of datasets and discourages duplication of effort
Encourages proper curation and management of data (you don’t want to share messy data, right?)
Ensures completeness of the scientific record, as well as transparency and reproducibility of research (fundamental tenets of the scientific method!)
Improves discoverability of datasets (they will never be discovered on that old hard drive in your desk drawer)

We had an internal meeting here at CDL yesterday about data publication. After running through this list of benefits for those in attendance, one of my colleagues asked the question: “Does listing these benefits work? Do researchers want to publish their data?” I didn’t hesitate to answer “No”.

Why not? The biggest reason is a lack of time. Preparing data for sharing and publication is laborious, and overstretched researchers aren’t motivated by these benefits given the current incentive structures in research (papers, papers, papers. And citation of those papers.). Of course, I think this is changing in the very near future. Check out my post on data sharing mandates in the works. So let’s go with the assumption that researchers want to publish. How do they go about this?

Methods for “publishing” data:

A personal or lab webpage. This is a common choice for researchers who wish to share data since they can maintain control of the datasets. However, there are issues with stability, persistence, discoverability of these data, siloed on individual websites. Plus, website maintenance often falls to the bottom of a researcher’s to-do list.
A disciplinary repository. This is a common solution for only a select few data types (e.g., genetic data). Most disciplines are still awaiting a culture change that will motivate researchers to share their data in this way.
An institutional repository. Of course, researchers have to know that this is an option (most don’t), and must then properly prepare their data for deposit.
Supplementary materials. In this case, the data accompany a primary journal article as supporting information. I recently shared data this way, but recognized that the data should also be placed in a curated repository. There are a few reasons for this apparent duplication:
- Supplemental materials are sometimes not available many years after publication due to broken links.
- Journals are not particularly excited about archiving lots of supplementary data, especially if it’s a large volume of data. This is not their area of expertise, after all.
Data article. This is a new-ish option: basically, you publish your data in a proper data journal (see this semi-complete list of data journals on the PREPARDE blog).

Wondering what a “data article” is? Let’s look to Sarah again:

A data article describes a dataset, giving details of its collection, processing, software, file formats, et cetera, without the requirement of novel analyses or ground-breaking conclusions.

That is, it’s a standalone product of research that can be cited as such. There is much debate surrounding such data articles. Among the issues are:

Is it really “publication”? How is this different from a landing page for the dataset that’s stored in a repository?
Traditional academic use of “publication” implies peer review. How do you review datasets?
How should publication differ depending on the discipline?

There are no easy answers to these questions, but I love hearing the debate. I’m optimistic that the forthcoming person we hire as a data publication postdoc will have some great ideas to contribute. Stay tuned!

Related Data Pub blog posts:
- Data Publication: An Introduction
- NSF Recognizes Data as an Academic Product
- Data Publication – The First 500 Years, by Lisa Schiff
- Data Publication and the Coproduction of Quality, by Eric Kansa
- We’re hiring a data publication postdoc

All Things Data in Amsterdam

The International Digital Curation Conference is wrapping up today, and I feel like I just finished a big, tasty Thanksgiving dinner: full and slightly uncomfortable, but in the brain rather than the gut. IDCC is a meeting that draws about 300 individuals from all over the world. Participants include librarians, repository administrators, publishers, funders, information technology folks, and people working at all manner of data and archiving organizations. Get these people in the same room, and the result is interesting talks, an amazing twitter backchannel, and novel ideas for collaboration. This was my first IDCC conference, and I was not disappointed.

Pre-workshops started on Monday, and I participated in a data management tools update (Data Management Planning: what’s happened, what’s happening and what’s coming next?), organized primarily by Martin Donnelly of the Digital Curation Centre in the UK. It was interesting to hear about the future of the DMPTool and DMPOnline, as well as an overview of current data policies in the UK, Europe, Australia, and the US. Martin and I are arranging a similar workshop for the iConference, held next month in Fort Worth TX.

On Tuesday, I was inundated with really great talks and conversations. The keynote speaker was Ewan Birney from the European Bioinformatics Institute on Bioinformatics infrastructure in Europe was chock full of great examples about how data sharing can benefit research. There was also a talk by Kaitlin Thaney from Digital Science, who discussed the many projects they are funding, including Figshare and Altmetric. These two talks highlighted the many approaches people are taking to tackle digital data: we need both infrastructure and tools, as well as incentives and changes in the culture of research data.

Fun fact: Eddie and Alex Van Halen are Dutch! Photo from bumslogic.wordpress.com

Tuesday afternoon was devoted to a poster session where I schmoozed with folks over the DataUp poster. The DataUp team (Trisha Cruse, John Kunze, and myself) won 2nd place for best poster; first place went to the Right Field project (Right FIeld: Spreadsheet Annotation by Stealth), which was especially interesting given how closely aligned this project is with DataUp. Wednesday was more talks, meetings, and discussions. I’m excited about the post-conference workshop today on data publication. I’m guessing I will be inspired by this workshop and my next blog post will be about all things data publication.

Hungry for some Dutch music trivia? Wikipedia has a great list of songs about Amsterdam… including one by Van Halen.

My Resolutions For You

New Year's Day 1918 at Ocean Beach. From Calisphere courtesy of San Diego History Center — New Year’s Day 1918 at Ocean Beach. From Calisphere courtesy of San Diego History Center. Click image for more information.

Since this is the very first 2013 post, I feel it necessary to trot out some New Year’s Resolutions. These are not my resolutions, however – instead I offer them as additions to researchers’ own lists. If just a small fraction of the data-producing researchers out there made these five resolutions, we are guaranteed to be in a much more comfortable place this time next year for data sharing, reproducibility, and meeting ethical responsibilities.

1. Write up a data management plan for your current project.

By now you all know that data management plans (DMPs) are required by many funding agencies. This does not, however, restrict good planning to future projects. Pick a current research project, and based on your current knowledge of the project, write up a data management plan. I know, I know – the project has already started; what’s the point? The value of this exercise is four fold:

You will be forced to think carefully about the data you are producing (or already produced), relationships between those data, and how best to document and preserve those datasets.
You can do some mid-course correction of bad organization habits, poor metadata documentation, or insufficient security and backup plans.
You can make decisions about preserving your data for the long term, including selecting a repository and beginning the process of metadata creation.
You will write a better DMP in the future.

I wrote about retroactive data management back in 2011 in a post called Data Hangover Part 2: Going Retro. Looking for DMP inspiration? Read this more recent post: Good DMP Examples + Going Beyond Two Pages.

2. Get better at metadata.

Metadata is hard. The concept is easy enough to understand (the overused “data about your data” phrase comes to mind), but the actual process of creating good metadata is a challenge for even the most savvy data steward. No one expects you to be a master metadata creator, but you should know how to put together some machine-readable, standardized metadata for your datasets. If you are clueless on where to start, talk to a local librarian. You can also check with repositories you might want to use later to archive your data – they often have guidelines about what metadata you need to generate. Ease your way in to the wide world of metadata with some free software tools: check out DataUp (for tabular/Excel data), Morpho (ecological data), or Tkme (geospatial data).

I also have a few blog posts about metadata: peruse this list.

3. Provide open access to one of your published articles.

I wrote about making all of your work Open Access a few months back. It’s not hard, but it does take a little bit of time to figure out how best to proceed based on where the article was published and what repositories are available for you. This year, pledge to get one of your articles out there as an “OA Green” publication by following my recommendations at the post above.

4. Make one dataset publicly available.

You know you have it: some sad, orphaned dataset from grad school or a postdoc that never really got its day in the sunshine. Instead of letting it drift into obsolescence, make it publicly available. Let’s be honest – you weren’t going to publish with those data. No matter how big or small the dataset, generate some metadata (if you haven’t already) and push the data out to the community using a repository. Don’t know of a repository? Use the DataUp tool, hosted at here at CDL. All of the data deposited into ONEShare via DataUp are publicly available. Bonus: DataUp also helps you create metadata and get a persistent identifier so others can cite your data.

5. Promote good data stewardship in your lab, department, or institution.

Share this blog post with your lab mates, students, or advisor. Suggest a brown bag discussion of open access for your department. Meet with one of your librarians to talk about options for archiving your data. Have a DMP writing session one afternoon with some colleagues. The culture change towards good data stewardship is underway; it’s just a matter of when you decide to join.