Recently, I found myself wondering What the heck is data governance? I was asked to participate in a workshop on Data Governance, supported by DataONE and led by MacKenzie Smith of Creative Commons and Trisha Cruse of UC3. I promptly replied “yes!”, pretending to understand the phrase, and then hurried back to my computer and Googled it.
Data governance is one of those phrases where you can define all of the words involved, but are unclear what they represent when strung together. No need for you to start Googling – after participating in the data governance workshop in DC for the last couple of days, I can happily report all that I learned and save you the effort.
First, let’s define data governance (based on Wikipedia’s entry): it’s the policies surrounding data, including data risk management, assignment of roles and responsibilities for data, and more generally formally managing data assets throughout the research cycle. Data governance issues include things like
- data sharing licenses
- providing credit for data (see my post about data citation here)
- managing persistent identifiers (like those available via EZID)
- documenting data provenance
- sharing metadata to enable discovery
- establishing registries for standards and ontologies
Many scientists might think this is a rather dry set of topics (whether they are correct is a matter of opinion!). Scientists aren’t concerned about the policies surrounding data, and they have very little incentive for caring. We have all signed copyright agreements when we publish in journals and patent agreements for our institutions (like this one for the UC system). But how many of us have read those documents? We have agreed to the terms and conditions of accepting funding, using institutional resources, publishing in journals, and engaging in collaborative research; but how many of us know what we have agreed to do with our data? My guess? Very close to zero.
The important point here is that we SHOULD care. In my conversations with scientists, I have discovered that most of them, if willing to share at all, would like to place restrictions on access and use of their data. We need to be involved in those data governance discussions if we want to set the terms of our data sharing.
The data governance meeting was attended by 30 folks representing a wide range of perspectives. There were publishers, librarians, funders, scientists, data managers and a lawyer to offer up their ideas about how best to tackle the issues surrounding digital data. Examples of issues that surfaced:
- Who owns the data?
- Who is legally allowed to set the polices for data access and use?
- How are data affected by copyright law?
- How should we handle data that is used for meta-analysis, and therefore subject to many different policies?
- What is the implicit policy if none is specified?
- How should we educate the community of stakeholders about data governance?
We certainly didn’t solve all of the problems associated with data governance, but we made good headway on starting the conversation and encouraging further work in this area. I will expand on some of these topics in the next blog entry, so stay tuned! For a preview, check out this Storify record of the Twitter feed from the meeting.