(index page)
Neuroimaging as a case study in research data management: Part 1
Part 1: What we did and what we found
This post was originally posted on Medium.
How do brain imaging researchers manage and share their data? This question, posed rather flippantly on Twitter a year and a half ago, prompted a collaborative research project. To celebrate the recent publication of a bioRxiv preprint, here is an overview of what we did, what we found, and what we’re looking to do next.
What we did and why
Magnetic resonance imaging (MRI) is a widely-used and powerful tool for studying the structure and function of the brain. Because of the complexity of the underlying signal, the iterative and flexible nature of analytical pipelines, and the cost (measured in terms of both grant funding and person hours) of collecting, saving, organizing, and analyzing such large and diverse datasets, effective research data management (RDM) is essential in research projects involving MRI. However, while the field of neuroimaging has recently grappled with a number of issues related to the rigor and reproducibility of its methods, information about how researchers manage their data within the laboratory remains mostly anecdotal.
Within and beyond the field of neuroimaging, efforts to address rigor and reproducibility often focus on problems such as publication bias and sub-optimal methodological practices and solutions such as the open sharing of research data. While it doesn’t make for particularly splashy headlines (unlike, say, this), RDM is also an important component of establishing rigor and reproducibility. If experimental results to be verified and repurposed, the underlying data must be properly saved and organized. Said another way, even openly shared data isn’t particularly useful if you can’t make sense of it. Therefore, in an effort to inform the ongoing conversation about reproducibility in neuroimaging, myself and Ana Van Gulick set out to survey the RDM practices and perceptions of the active MRI research community.
https://twitter.com/JohnBorghi/status/758030771097636869
With input from several active neuroimaging researchers, we designed and distributed a survey that described RDM-related topics using language and terminology familiar to researchers who use MRI. Questions inquired about the type(s) of data collected, the use analytical tools, procedures for transferring and saving data, and the degree to which RDM practices and procedures were standardized within laboratories or research groups. Building on my work to develop an RDM guide for researchers, we also asked participants to rate the maturity of both their own RDM practices and those of the field as a whole. Throughout the survey, we were careful to note that our intention was not to judge researchers with different styles of data management and that RDM maturity is largely orthogonal to the sophistication of data collection and analysis techniques.
Wait, what? A brief introduction to MRI and RDM.
Magnetic resonance imaging (MRI) is a medical imaging technique that uses magnetic fields and radio waves to create detailed images of organs and tissues. Widely used in medical settings, MRI has also become important tool for neuroscience researchers especially since the development of functional MRI (fMRI) in the early 1990’s. By detecting changes in blood flow that are associated with changes in brain activity, fMRI allows researchers to non-invasively study the structure and function of the living brain.
Because there are so many perspectives involved, it is difficult to give a single comprehensive definition of research data management (RDM). But, basically, the term covers activities related to how data is handled over the course of a research project. These activities include, but are certainly not limited to, those related to how data is organized and saved, how procedures and decisions are documented, and how research outputs are stored are shared. Many academic libraries have begun to offer services related to RDM.
Neuroimaging research involving MRI presented something of an ideal case study for us to study RDM among active researchers. The last few years have seen a rapid proliferation of standards, tools, and best practice recommendations related to the management and sharing of MRI data. Neuroimaging research also crosses many topics relevant to RDM support providers such as data sharing and publication, the handling of sensitive data, and the use and curation of research software. Finally, as neuroimaging researchers who now work in academic libraries, we are uniquely positioned to work across the two communities.
What we found
After developing our survey and receiving the appropriate IRB approvals, we solicited responses to our survey during Summer 2017. A total of 144 neuroimaging researchers participated and their responses revealed several trends that we hope will be informative for both neuroimaging researchers and also data support providers in a academic libraries.
As shown below, our participants indicated that their RDM practices throughout the course of a research project were largely motivated by immediate practical concerns such as preventing the loss of data and the ensuring access to everyone within a lab or research group and limited by a lack of time and discipline-specific best practices.

We were relatively unsurprised to see that neuroimaging researchers use a wide array of software tools analyze their often heterogeneous sets of data. What did surprise us somewhat was the different responses from trainees (graduate students and postdocs) and faculty on questions related to the consistency of RDM practices within their labs. Trainees were significantly less likely to say that practices related to backing up, organizing, and documenting datas were standardized within their lab than faculty, which we think highlights the need for better communication about how RDM is an essential component of ensuring that research is rigorous and reproducible.
Analysis of RDM maturity ratings revealed that our sample generally rated their own RDM practices as more mature than the field as a whole and practices during the data collection and analysis phases of a project as significantly more mature than those during the data sharing phase. There are several interpretations of the former result, but the later is consistent with the low level of data sharing in the field. Though these ratings provide an interesting insight into the perceptions of the active research community, we believe there is substantial room for improvement in establishing proper RDM across every phase of a project, not just after after the data has already been analyzed.

For a complete overview of our results, including an analysis of how the field of neuroimaging is at a major point of transition when it comes to the adoption of practices including open access publishing, preregistration, replication, check out our preprint now on bioRxiv. While you’re at it, feel free to peruse, reuse, or remix our survey and data, both of which are available on figshare.
Is this unique to MRI research?
Definitely not. Just as the consequences of sub-optimal methodological practices and publication biases have been discussed throughout the biomedical and behavioral sciences for decades, we suspect that the RDM-related practices and perceptions observed in our survey are not limited to neuroimaging research involving MRI.
To paraphrase and reiterate a point made in the preprint, this work was intended to be descriptive not prescriptive. We also very consciously have not provided best practice recommendations because we believe that such recommendations would be most valuable (and actionable) if developed in collaboration with active researchers. Moving forward, we hope to continue to engage with the neuroimaging community on issues related to RDM and also expand the scope of our survey to other research communities such as psychology and biomedical science.
Additional Reading
Our preprint, one more time:
- Borghi, J. A., & Van Gulick, A. E. (2018). Data management and sharing in neuroimaging: Practices and perceptions of MRI researchers. bioRxiv.
For a primer on functional magnetic resonance imaging:
- Soares, J. M., Magalhães, R., Moreira, P. S., Sousa, A., Ganz, E., Sampaio, A., … Sousa, N. (2016). A hitchhiker’s guide to functional magnetic resonance imaging. Frontiers in Neuroscience, 10, 1–35.
For more on rigor, reproducibility, and neuroimaging:
- Nichols, T. E., Das, S., Eickhoff, S. B., Evans, A. C., Glatard, T., Hanke M., … Yeo, B. T. T. (2017). Best practices in data analysis and sharing in neuroimaging using MRI. Nature Neuroscience, 20(3), 299–303. (Preprint)
- Poldrack, R. A., Baker, C. I., Durnez, J., Gorgolewski, K. J., Matthews, P. M., Munafò, M. R., … Yarkoni, T. (2017). Scanning the horizon: Towards transparent and reproducible neuroimaging research. Nature Reviews Neuroscience, 18(2), 115–126. (Preprint)
Welcome to OA Week 2017!
By John Borghi and Daniella Lowenberg
It’s Open Access week and that means it’s time to spotlight and explore Open Data as an essential component to liberating and advancing research.

Let’s Celebrate!
Who: Everyone. Everyone benefits from open research. Researchers opening up their data provides access to the people who paid for it (including taxpayers!), patients, policy makers, and other researchers who may build upon it and use it to expedite discoveries.
What: Making data open means making it available for others to use and examine as they see fit. Open data is about more than just making the data available on its own, it is also about opening up the tools, materials, and documentation that describes how the data were collected and analyzed and why decisions about the data were made.
When: Data can be made open anytime a paper is published, anytime null or negative results are found, anytime data are curated. All the open data, all the time.
Where: If you are a UC researcher, resources free to you are available at each of your campuses Research Data Management library websites. Dash is a data publication platform to make your data open and archived for participating UC campuses, UC Press, and DataONE’s ONEShare. For more open data resources, check out our upcoming post on Wednesday, October 25th.
Why: Data are what support conclusions, discoveries, cures, and policies. Opening up articles for free access to the world is very important, but the articles are only so valuable without the data that went into them.
Follow this week as we cover policies, user stories, resources, economics, and justifications for why researchers should all be making their (de-identified, IRB approved) data freely available.
Tweet to us @UC3CDL with any questions, comments, or contributions you may have.
Upcoming Posts
Tuesday, October 24th: Open Data in Order to… Stories & Testimonials
Wednesday, October 25th: Policies, Resources, & Guidance on How to Make Your Data Open
Thursday, October 26th: Open Data and Reproducibility
Friday, October 27th: Open Data and Maximizing the Value of Research
Communication Breakdown: Nerds, Geeks, and Dweebs
Last week the DCXL crew worked on finishing up the metadata schema that we will implement in the DCXL project. WAIT! Keep reading! I know the phrase “metadata schema” doesn’t necessarily excite folks – especially science folks. I have a theory for why this might be, and it can be boiled down to a systemic problem I’ve encountered ever since becoming deeply entrenched in all things related to data stewardship: communication breakdown.
I began working with the DataONE group in 2010, and I was quickly overwhelmed by the rather steep learning curve I encountered related to data topics. There was a whole vocabulary set I had to learn, an entire ecosphere of software and hardware, and a hugely complex web of computer science-y, database-y, programming-y concepts to unpack. I persevered because the topics were interesting to me, but I often found myself spending time on websites that were indecipherable to the average intelligent person, or reading 50 page “quick start guides”, or getting entangled in a rabbit hole of wikipedia entries for new concepts related to data.

I love learning, so I am not one to complain about spending time exploring new concepts. However I would argue that my difficulties represent a much bigger issue plaguing advances in data stewardship: communication issues. It’s actually quite obvious why these communication problems exist. There are a lot of smart people involved in data, all of whom have very divergent backgrounds. I suggest that the smart people can be broken down into three camps: the nerds, the geeks, and the dweebs. These stereotypes should not be considered insults; rather they are an easy way to refer to scientists, librarians, and computer types. Check out the full venn diagram of nerds here.
The Nerds. This is the group to which I belong. We are specially trained in a field and have in-depth knowledge of our pet projects, but general education about computers, digital data, and data preservation are not part of our education. Certainly that might change in the near future, but in general we avoid the command line like the plague, prefer user-friendly GUIs, and resist any learning of new software, tools, etc. that might take away from learning about our pet projects.
The geeks. Also known as computer folks. These folks might be developers, computer scientists, information technology specialists, database managers, etc. They are uber-smart, but from what I can tell their uber-smart brains do not work like mine. From what I can tell, geeks can explain things to me in one of two ways:
- “To turn your computing machine on, you need to first plug it in. Then push the big button.”
- “First go to bluberdyblabla and enter c>*#&$) at the prompt. Make sure the juberdystuff is installed in the right directory, though. Otherwise you need to enter #($&%@> first and check the shumptybla before proceeding.”
In all fairness, (1) occurs far less than (2). But often you get (1) after trying to get clarification on (2). How to remedy this? First, geeks should realize that our brains don’t think in terms of directories and command line prompts. We are more comfortable with folders we can color code and GUIs that allow us to use the mouse for making things happen. That said, we aren’t completely clueless. Just remember that our vocabularies are often quite different from yours. Often I’ve found myself writing down terms in a meeting so I can go look them up later. Things like “elements” and “terminal” are not unfamiliar words in and of themselves. However the contexts in which they are used are completely new to me. That doesn’t even count the unfamiliar words and acronyms, like APIs, github, Python, and XML.
The dweebs. Also known as librarians. These folks are more often being called “information professionals”, but the gist is the same – they are all about understanding how to deal with information in all its forms. There’s certainly a bit of crossover with the computer types, especially when it comes to data. However librarian types are fundamentelly different in that they are often concerned with information generated by other people: put simply they want to help, or at least interact with, data producers. There are certainly a host of terms that are used more often by librarian types: “indexing” and “curation” come to mind. Check out the DCXL post on libraries from January.
Many of the projects in which I am currently involved require all three of these groups: nerds, geeks, and dweebs. I watch each group struggle to communicate their points to the others, and too often decide that it’s not worth the effort. How can we solve this communication impasse? I have a few ideas:
- Nerds: open your minds to the possibility that computer types and librarian types might know about better ways of doing what you are doing. Tap the resources that these groups have to offer. Stop being scared of the unknown. You love learning or you wouldn’t be a scientist; devote some of that love in the direction of improving your computer savvy.
- Geeks: dumb it down, but not too much. Recognize that scientists and librarians are smart, but potentially in very different ways than you. Also, please recognize that change will be incremental, and we will not universally adopt whatever you think is the best possible set of tools or strategies and how “totally stupid” or current workflow seems.
- Dweebs: spend some time getting to know the disciplines you want to help. Toot your own horn– you know A LOT of stuff that nerds and geeks don’t, and you are all so darn shy! Make sure both geeks and nerds know of your capacity to help, and your ability to lend important information to the discussion.
And now a special message to nerds (please see the comment string below about this message and its potential misinterpretation). I plead with you to stop reinventing the wheel. As scientists have begun thinking about their digital data, I’ve seen a scary trend of them taking the initiative to invent standards, start databases, or create software. It’s frustrating to see since there are a whole set of folks out there who have been working on databases, standards, vocabularies, and software: librarians and computer types. Consult with them rather than starting from scratch.
In the case of dweebs, nerds, and geeks, working together as a whole is much much better than summing up our parts.
Academic Libraries: Under-Used & Under-Appreciated
I’m guilty. I often admit this when I meet librarians at conferences and workshops – I’m guilty of never using my librarians as a resource in my 13 years of higher ed, spread across seven academic institutions. At the very impressive MBL-WHOI Library in Woods Hole MA, there are quite a few friendly librarians that make their presence known to visitors. They certainly offered to help me, but it never occurred to me that they might be useful beyond telling me on what floor I can find the journal Limnology and Oceanography.
In hindsight, I didn’t know any better. Yes, we took the requisite library tour in grad school, and yes, I certainly used the libraries for research and access to books and journals, but no, I never talked to the librarians. Why is this? I have a few theories:
Librarians are terrible at self promotion. Every time I meet librarian, I’m awed and amazed by the vast quantities of knowledge they hold about all kinds of information. But most of the librarians I’ve encountered are unwilling to own up to their vast skill set. These humble folks assume scientists will come to them, completely underestimating the average academic’s stubbornness and propensity for self-sufficiency. In my opinion, librarians should stake out the popular coffee spot on campus and wear sandwich boards saying things like “You have no idea how to do research” or “Five minutes with me can change your <research> life“. Come on, librarians – toot your own horns!
Academics are trained to be self-sufficient. Every grad student has probably gotten the talk from their advisor at some point in their grad education. In my case the talk had phrases like these:
- “You don’t have to ask me EVERY time you want to run down to the supply room”
- “Which method do YOU think would work best?”
- “How should I know how to dilute that acid? Go figure it out!”
It only takes a couple of brush-offs from your advisor before you realize that part of learning to be scientist involves solving problems all by yourself. This bodes well for future academic success, but does not allow us to entertain the idea that librarians might be helpful and save us oodles of time.
Google gives academics a false sense of security. Yes, I spend a lot of time Googling things. Many of this Googling occurs while having a drink with friends – some hotly debated item of trivia comes up, which requires that we pull out our smart phones to find out who’s right (it’s usually me). But Google can’t answer everything. Yes, it’s wonderful for figuring out who that actor in that movie was, or for showing a latecomer the amazing honey badger video. But Google is not necessarily the most efficient way to go about scholarly research. Librarians know this – they have entire schools dedicated to figuring out how to deal with information. The field of information science, which encompasses librarians, gives out graduate degrees in information. Do you really think that you know more about research than someone with a grad degree in information?? Extremely unlikely. Learn more about Information Science here.

This post does, in fact, relate to the DCXL project. If you weren’t aware, the DCXL project is based out of California Digital Library. It turns out that librarians are quite good at being stewards of scholarly communication; who better to help us navigate the tricky world of digital data curation than librarians?
This post was inspired by a great blog posted yesterday from CogSci Librarian: How Librarians Can Help in Real Life, at #Sci013, and more