You know what they say: Timing is everything. Time enters into the data management and stewardship equation at several points and warrants discussion here. Why timeliness? Last week at the University of North Texas‘ Open Access Symposium, there were several great speakers who touched on timeliness of data management, organization, and sharing. It led me to wonder whether there is agreement about the timeliness of activities data-related, so here I’ve posted my opinions about time in a few points in the life cycle of data. Feel free to comment on this post with your own opinions.
1. When should you start thinking about data management? The best answer to this question is as soon as possible. The sooner you plan, the less likely you are to be surprised by issues like metadata standards or funder requirements (see my previous DCXL post about things you will wish you had thought about documenting). The NSF mandate for data management plans is a great motivator for thinking sooner rather than later, but let’s face facts: the DMP requirement is only two pages, and you can create one th
at might pass muster without really thinking too carefully about your data. I encourage everyone to go well beyond funder requirements and thoughtfully plan out your approach to data stewardship. Spend plenty of time doing this, and return to your plan often during your project to update it.
2. When should you start archiving your data? By archiving, I do not mean backing up your data (that answer is constantly). I am referring to the action of putting your data into a repository for long-term (20+ years) storage. This is a more complicated question of timeliness. Issues that should be considered include:
- Is your data collection ongoing? Continuously updated sensor or instrument data should begin being archived as soon as collection begins.
- Is your dataset likely to undergo a lot of versions? You might wait to begin archiving until you get close to your final version.
- Are others likely to want access to your data soon? Especially colleagues or co-authors? If the answer is yes, begin archiving early so that you are all using the same datasets for analysis.
3. When should you make your data publicly accessible? My favorite answer to this question is also as soon as possible. But this might mean different things for different scientists. For instance, making your data available in near-real time, either on a website or in a repository that supports versioning, allows others to use it, comment on it, and collaborate with you while you are still working on the project. This approach has its benefits, but also tends to scare off some scientists who are worried about being scooped. So if you aren’t an open data kind of person, you should make your data publicly available at the time of publication. Some journals are already requiring this, and more are likely to follow.
There are some that would still balk at making data available at publication: What if I want to publish more papers with this dataset in the future? In that case, have an honest conversation with yourself. What do you mean by “future”? Are you really likely to follow through on those future projects that might use the dataset? If the answer is no, you should make the data available to enhance your chances for collaboration. If the answer is yes, give yourself a little bit of temporal padding, but not too much. Think about enforcing a deadline of two years, at which point you make the data available whether you have finished those dream projects or not. Alternatively, find out if your favorite data repository will enforce your deadline for you– you may be able to provide them with a release date for your data, whether or not they hear from you first.