Since this is the very first 2013 post, I feel it necessary to trot out some New Year’s Resolutions. These are not my resolutions, however – instead I offer them as additions to researchers’ own lists. If just a small fraction of the data-producing researchers out there made these five resolutions, we are guaranteed to be in a much more comfortable place this time next year for data sharing, reproducibility, and meeting ethical responsibilities.
1. Write up a data management plan for your current project.
By now you all know that data management plans (DMPs) are required by many funding agencies. This does not, however, restrict good planning to future projects. Pick a current research project, and based on your current knowledge of the project, write up a data management plan. I know, I know – the project has already started; what’s the point? The value of this exercise is four fold:
- You will be forced to think carefully about the data you are producing (or already produced), relationships between those data, and how best to document and preserve those datasets.
- You can do some mid-course correction of bad organization habits, poor metadata documentation, or insufficient security and backup plans.
- You can make decisions about preserving your data for the long term, including selecting a repository and beginning the process of metadata creation.
- You will write a better DMP in the future.
I wrote about retroactive data management back in 2011 in a post called Data Hangover Part 2: Going Retro. Looking for DMP inspiration? Read this more recent post: Good DMP Examples + Going Beyond Two Pages.
2. Get better at metadata.
Metadata is hard. The concept is easy enough to understand (the overused “data about your data” phrase comes to mind), but the actual process of creating good metadata is a challenge for even the most savvy data steward. No one expects you to be a master metadata creator, but you should know how to put together some machine-readable, standardized metadata for your datasets. If you are clueless on where to start, talk to a local librarian. You can also check with repositories you might want to use later to archive your data – they often have guidelines about what metadata you need to generate. Ease your way in to the wide world of metadata with some free software tools: check out DataUp (for tabular/Excel data), Morpho (ecological data), or Tkme (geospatial data).
I also have a few blog posts about metadata: peruse this list.
3. Provide open access to one of your published articles.
I wrote about making all of your work Open Access a few months back. It’s not hard, but it does take a little bit of time to figure out how best to proceed based on where the article was published and what repositories are available for you. This year, pledge to get one of your articles out there as an “OA Green” publication by following my recommendations at the post above.
4. Make one dataset publicly available.
You know you have it: some sad, orphaned dataset from grad school or a postdoc that never really got its day in the sunshine. Instead of letting it drift into obsolescence, make it publicly available. Let’s be honest – you weren’t going to publish with those data. No matter how big or small the dataset, generate some metadata (if you haven’t already) and push the data out to the community using a repository. Don’t know of a repository? Use the DataUp tool, hosted at here at CDL. All of the data deposited into ONEShare via DataUp are publicly available. Bonus: DataUp also helps you create metadata and get a persistent identifier so others can cite your data.
5. Promote good data stewardship in your lab, department, or institution.
Share this blog post with your lab mates, students, or advisor. Suggest a brown bag discussion of open access for your department. Meet with one of your librarians to talk about options for archiving your data. Have a DMP writing session one afternoon with some colleagues. The culture change towards good data stewardship is underway; it’s just a matter of when you decide to join.