Skip to main content

Who “owns” your data?

CDL UC3,

tugofwar-kathleentylerconklin

Tug of war, Kathleen Tyler Conklin, CC BY-NC 2.0

This post was originally published on the University of California Office of Scholarly Communication blog.

Which of these is true?

“The PI owns the data.”

“The university owns the data.”

“Nobody can own it; data isn’t copyrightable.”

You’ve probably heard somebody say at least one of these things — confidently. Maybe you’ve heard all of them. Maybe about the same dataset (but in that case, hopefully not from the same person). So who really owns research data? Well, the short answer is “it depends.”

A longer answer is that determining ownership (and whether there’s even anything to own) can be frustratingly complicated — and, even when obvious, ownership only determines some of what can be done with data. Other things like policies, contracts, and laws may dictate certain terms in circumstances where ownership isn’t relevant — or even augment or overrule an owner where it is. To avoid an unpleasant surprise about what you can or can’t do with your data, you’ll want to plan ahead and think beyond the simple question of ownership.

This is a long post. Here’s a quick roadmap of what’s ahead:

1. Instead of starting with ownership, think about rights and responsibilities.

We tend to use the word “ownership” when we talk about data because the word invokes familiar conventions; when we talk about physical property, ownership is synonymous with ultimate control and responsibility. But rights and responsibilities in data are often more granular, and different types of rights and responsibilities might be important to different individuals or organizations.

For instance:

Data sharing. Can — or must — the data be made public? Where? When? In what format? Under what kind of reuse terms — CC0, CC BY, something else?

Data access. Who can see the data? Who can use it, and for what purposes? Who can edit it? Who can get a copy? Who can give permission to other people?

Commercial use. Is someone planning on filing a patent claim? Producing software? Licensing the data to other users for a fee? Are there agreements with funders or partners forbidding or requiring commercial rights, or controlling how income is split?

Credit. Does a funder, partner, employer, or data provider require credit in publications arising from the data? In new datasets incorporating data obtained from them? In what format?

Preservation. Is there an obligation to maintain the data for others to use, or to request copies of? Where? For how long? In what format? Under what kind of security controls?

2. If you do need to figure out ownership, be prepared to argue about copyright.

What can you own? In a nutshell, you can own things that the law defines as property. Property includes real estate, objects like cars and computers, and intellectual property like patents.

Research data is often stored on computers or in notebooks — physical items that someone can own. Usually it’s pretty easy to figure out who owns those objects, but this isn’t what most people are talking about when they talk about owning data. It’s more likely, if they really mean ownership at all, that they mean ownership of intellectual property rights in the data, and probably just copyright. There’s a great article by Michael W. Carroll in PLOS (“Sharing Research Data and Intellectual Property Law: A Primer”) that discusses different types of IP rights in research data. A few highlights:

Why is copyright such a complicated issue for research data? Because facts aren’t copyrightable, but works of authorship are — and research data consist of one, or the other, or some combination, and often there’s room to argue about which.

Those are the basic outlines for assessing whether there are copyright interests to worry about. But keep in mind that, in cases where there is copyrightable expression, there may be multiple authors, transferring some of their rights or building layers on top of previous work. Throw into the mix the fact that some copyright owners will be individuals and some will be organizations, because an employer is considered the author of a work made for hire. “It’s complicated” is starting to feel like a serious understatement.

Let’s say, with all of that, you are fairly confident about a) whether the research data you’re interested in is eligible for copyright protection and b) who, if anyone, owns that copyright. Good job. Now what? You still probably don’t have all the information you need in order to figure out if you can take a copy of that data with you when you switch jobs, share it in a data repository, or delete it to free up space on your hard drive. Instead of following automatically from copyright ownership, these things are often controlled by contract or official policies.

3. See if policies or contracts provide support or create obstacles.

Realistically, you’re not going to start using phrases like “right to reproduction of the full dataset for commercial and noncommercial purposes” in your average conversation and “owns” may be a reasonable shorthand. But in written documents, particularly when funding, employment, or the value of research to other potential users around the world may be at stake, it pays to be specific. Sit down and think carefully about the rights and responsibilities above and what you want to be able to do with your data; then start looking at documents that could conflict with your plans (or with each other).

Employment contracts and employer policies. The institution where you work may say specific things about whether data can be shared, copied, taken with you, or deleted. It may specify that patent ownership lies with the university and copyright ownership rests with employees. Or it may just say “data is the property of the university” without addressing what, exactly, is “owned.” Check out this interim guidance document from UCLA’s Office of Research Policy & Compliance for some examples of specific rights and obligations.

If you can’t find a relevant policy at your institution, or the one that you find doesn’t answer the questions that are important to you, ask for clarification. In writing. Better yet, ask for the policy to be updated with clearer language. If you don’t know who to talk to, skim your campus directory for a name like “Office of Research” or “Policy Office.” You can also try the library. They may have people who specialize in data management issues; even if they don’t, or don’t already know who to ask, they’re good at finding out stuff.

Grant agreements and funder policies. More and more government agencies are requiring that the research data they fund be publicly shared. Sometimes they specify where and when to share the data, or what types of confidential information to redact. We have a page that describes some of the basic requirements for US federal government funders, but the best source of information is the funder itself. Some of these requirements may not be general, or public, but written into a specific grant award. Here’s some example language from NOAA’s Text to be included in Notices of Award and Contracts of projects anticipated to generate environmental data or peer-reviewed publications:

Environmental data collected or created under this Grant, Cooperative Agreement, or Contract must be made publicly visible and accessible in a timely manner, free of charge or at minimal cost that is no more than the cost of distribution to the user, except where limited by law, regulation, policy, or national security requirements.

Data accessibility must occur no later than publication of a peer-reviewed article based on the data, or two years after the data are collected and verified, or two years after the original end date of the grant (not including any extensions or follow-on funding), whichever is soonest, unless a delay has been authorized by the NOAA funding program.

Other contracts and policies. Employers and funders are the most common sources for granting or limiting rights or assigning responsibilities for data, but it’s a big world out there. Maybe you work in a lab that has its own policies or are part of a larger collaborative that has its own partnership agreement. Maybe you’re publishing with a journal that has data sharing requirements. Maybe, as discussed above, you’re re-using someone else’s data and you had to agree to certain terms to get it. Whatever the contract or policy is, if you’ve agreed to it — or you work in a group and someone representing your group agreed to it — or it’s an official policy and you’ve agreed to something that incorporates it by reference — it can limit what you can do with your data.

Now what? If no contracts or policies are an obstacle to what you want to do with your data, you’re in good shape. If there’s no intellectual property in your data, or if there is but you own it, or if you don’t but you have permission for what you want to do — ditto. But as you can see, ownership is a pretty small part of the picture.

4. Plan ahead to save yourself some stress.

No one wants to find themselves poring over contracts and policy documents on the verge of publication, a job change, or a funding renewal application. The best way to avoid this is to know up front you want to do with data, write it up clearly, and get it signed off on by your important stakeholders or anyone you’re worried might be a roadblock later — say by using a data management plan.

A data management plan (DMP) can help you spot issues related to data before they become problems, and it’s a document increasingly required by grant funding agencies. The DMPTool has all kinds of guidance to get you started, as well as funder requirements and sample plans to look at. And it’s free.

So if, for example, you decide at the start that you want to post your data to DataONE with a CC0 waiver and you explain this in a DMP you submit to your funder (and you keep a copy of an email where you discuss this with your collaborators and your local office of research or tech transfer), then an issue like “how much copyrightable expression is there to own in this data?” is something you can probably avoid worrying about.

Say it with me: data management planning is my friend.

This post makes a couple of references to Creative Commons tools without going into any details. To hear more, come read the follow up post about attribution, CC licenses, and why CC BY may not work how you expect it to with data.