Data management and data curation are related concepts, but they do not refer to precisely the same things. I use these terms so often now that sometimes the distinctions, fuzzy as they are, become indistinguishable. When this happens I return to visual abstractions to clarify —in my own mind—what I mean by one vs. the other. Data management is more straightforward and almost always comes in the guise of something like this:
The obligatory research data management life cycle slide. Everyone uses it, myself included, in just about every presentation I give these days. This simple (arguably oversimplified) but useful model defines more-or-less discrete data activities that correspond with different phases of the research process. It conveys what it needs to convey; namely, that data management is a dynamic cycle of activities that constantly influence one another. Essentially, we can envision a feedback loop.
Data curation, on the other hand, is a complex beastie. Standard definitions cluster around something like this one from the Digital Curation Centre in the UK:
Data curation involves maintaining, preserving, and adding value to digital research data throughout its lifecycle.
When pressed for a definition, this is certainly an elegant response. But, personally, I don’t find it to be helpful at all when I try to wrap my head around the myriad activities that go into curating anything, much less distinguishing management activities from curation activities. Moreover, I’m talking about all kinds of activities in the context of “data,” a squishy concept in and of itself. (We’ll go with the NSF’s definition: the recorded factual material commonly accepted in the scientific community as necessary to validate research findings.)
I suppose I should mention sooner or later that the point of defining “data” and all these terms appended to it is the following. There’s a lot of it [data] and we need to figure out what on earth to do with it, ergo the proliferation of new positions with these things “data management” and “data curation” in their titles. It’s important to make sure we’re speaking the same language.
There are other, more expansive approaches to defining data curation and a related post on this very blog, but to really grasp what I’m talking about when I’m saying the words “data curation,” I invariably come back to this visualization created by Tim Norris. Tim is a geographer turned CLIR Postdoctoral Fellow in Data Curation at the University of Miami. Upon assuming a new post with an unfamiliar title, he decided to draw a map of his job to explain (to himself and to others) what he means by data curation. Many thanks to Tim for sharing this exercise with the rest of our CLIR cohort and now with the blogo-world-at-large.
Below is an abbreviated caption, in Tim’s own words, as well as short- (3 min) and long-format (9 min) tours of the map narrated by Tim. And here is a handy PNG file for those occasions when the looping life cycle visualizations just won’t do.
This map of data curation has two visual metaphors. The first is that of a stylized mandala: a drawing that implies both inwards and outwards motion that is in balance. And the second is that of a Zen Koen: first there is a mountain, then there’s none, and then there is. We start with visual complexity—the mountain. To build the data curation mountain we start with a definition of the word “curation” as a five step process that moves inwards. The final purpose of this curation is to move what is being curated back into the world for re-use, publication and dissemination. This can be understood as stewardship. Next we think about the sources of data in the outside world. These sources have been abstracted into three data spaces: library digital collections, external data sources, and research data products. As this data moves “inwards” we can think of verbs that describe the ingestion processes. Metadata creation, or describing the data, is a key that enables later data linkages to be identified with the final goal of making data interoperable. Once the data is “inside” the curation space it passes through a standard process that begins with storage and ends with discovery. Specific to data in this process are the formats in which the data is stored and the difference between preservation and conservation for data. To enable this work we need hardware, software, and human interfaces to the curated data. Finally, as the data moves back out into the world, we must pay attention to institutions of property rights and access. If we get this all right we will have a system that is sustainable, secure, and increases the value of our research data collections. Once again we have a mountain.