Neuroimaging as a case study in research data management: Part 1

John Borghi, February 19, 2018

Part 1: What we did and what we found

This post was originally posted on Medium.

How do brain imaging researchers manage and share their data? This question, posed rather flippantly on Twitter a year and a half ago, prompted a collaborative research project. To celebrate the recent publication of a bioRxiv preprint, here is an overview of what we did, what we found, and what we’re looking to do next.

What we did and why

Magnetic resonance imaging (MRI) is a widely-used and powerful tool for studying the structure and function of the brain. Because of the complexity of the underlying signal, the iterative and flexible nature of analytical pipelines, and the cost (measured in terms of both grant funding and person hours) of collecting, saving, organizing, and analyzing such large and diverse datasets, effective research data management (RDM) is essential in research projects involving MRI. However, while the field of neuroimaging has recently grappled with a number of issues related to the rigor and reproducibility of its methods, information about how researchers manage their data within the laboratory remains mostly anecdotal.

Within and beyond the field of neuroimaging, efforts to address rigor and reproducibility often focus on problems such as publication bias and sub-optimal methodological practices and solutions such as the open sharing of research data. While it doesn’t make for particularly splashy headlines (unlike, say, this), RDM is also an important component of establishing rigor and reproducibility. If experimental results to be verified and repurposed, the underlying data must be properly saved and organized. Said another way, even openly shared data isn’t particularly useful if you can’t make sense of it. Therefore, in an effort to inform the ongoing conversation about reproducibility in neuroimaging, myself and Ana Van Gulick set out to survey the RDM practices and perceptions of the active MRI research community.

https://twitter.com/JohnBorghi/status/758030771097636869

With input from several active neuroimaging researchers, we designed and distributed a survey that described RDM-related topics using language and terminology familiar to researchers who use MRI. Questions inquired about the type(s) of data collected, the use analytical tools, procedures for transferring and saving data, and the degree to which RDM practices and procedures were standardized within laboratories or research groups. Building on my work to develop an RDM guide for researchers, we also asked participants to rate the maturity of both their own RDM practices and those of the field as a whole. Throughout the survey, we were careful to note that our intention was not to judge researchers with different styles of data management and that RDM maturity is largely orthogonal to the sophistication of data collection and analysis techniques.

Wait, what? A brief introduction to MRI and RDM.

Magnetic resonance imaging (MRI) is a medical imaging technique that uses magnetic fields and radio waves to create detailed images of organs and tissues. Widely used in medical settings, MRI has also become important tool for neuroscience researchers especially since the development of functional MRI (fMRI) in the early 1990’s. By detecting changes in blood flow that are associated with changes in brain activity, fMRI allows researchers to non-invasively study the structure and function of the living brain.

Because there are so many perspectives involved, it is difficult to give a single comprehensive definition of research data management (RDM). But, basically, the term covers activities related to how data is handled over the course of a research project. These activities include, but are certainly not limited to, those related to how data is organized and saved, how procedures and decisions are documented, and how research outputs are stored are shared. Many academic libraries have begun to offer services related to RDM.

Neuroimaging research involving MRI presented something of an ideal case study for us to study RDM among active researchers. The last few years have seen a rapid proliferation of standards, tools, and best practice recommendations related to the management and sharing of MRI data. Neuroimaging research also crosses many topics relevant to RDM support providers such as data sharing and publication, the handling of sensitive data, and the use and curation of research software. Finally, as neuroimaging researchers who now work in academic libraries, we are uniquely positioned to work across the two communities.

What we found

After developing our survey and receiving the appropriate IRB approvals, we solicited responses to our survey during Summer 2017. A total of 144 neuroimaging researchers participated and their responses revealed several trends that we hope will be informative for both neuroimaging researchers and also data support providers in a academic libraries.

As shown below, our participants indicated that their RDM practices throughout the course of a research project were largely motivated by immediate practical concerns such as preventing the loss of data and the ensuring access to everyone within a lab or research group and limited by a lack of time and discipline-specific best practices.

What motivates and limits RDM practices in neuroimaging? When we asked active researchers, it turned out the answer was immediate and practical concerns. All values listed are percentages, participants could give multiple responses.

We were relatively unsurprised to see that neuroimaging researchers use a wide array of software tools analyze their often heterogeneous sets of data. What did surprise us somewhat was the different responses from trainees (graduate students and postdocs) and faculty on questions related to the consistency of RDM practices within their labs. Trainees were significantly less likely to say that practices related to backing up, organizing, and documenting datas were standardized within their lab than faculty, which we think highlights the need for better communication about how RDM is an essential component of ensuring that research is rigorous and reproducible.

Analysis of RDM maturity ratings revealed that our sample generally rated their own RDM practices as more mature than the field as a whole and practices during the data collection and analysis phases of a project as significantly more mature than those during the data sharing phase. There are several interpretations of the former result, but the later is consistent with the low level of data sharing in the field. Though these ratings provide an interesting insight into the perceptions of the active research community, we believe there is substantial room for improvement in establishing proper RDM across every phase of a project, not just after after the data has already been analyzed.

Study participants rated their own RDM practices during the data collection and analysis phases of a project as significantly more mature than than those of the field as a whole. Ratings for the data sharing phase were significantly lower than ratings for the data collection and analysis phases.

For a complete overview of our results, including an analysis of how the field of neuroimaging is at a major point of transition when it comes to the adoption of practices including open access publishing, preregistration, replication, check out our preprint now on bioRxiv. While you’re at it, feel free to peruse, reuse, or remix our survey and data, both of which are available on figshare.

Is this unique to MRI research?

Definitely not. Just as the consequences of sub-optimal methodological practices and publication biases have been discussed throughout the biomedical and behavioral sciences for decades, we suspect that the RDM-related practices and perceptions observed in our survey are not limited to neuroimaging research involving MRI.

To paraphrase and reiterate a point made in the preprint, this work was intended to be descriptive not prescriptive. We also very consciously have not provided best practice recommendations because we believe that such recommendations would be most valuable (and actionable) if developed in collaboration with active researchers. Moving forward, we hope to continue to engage with the neuroimaging community on issues related to RDM and also expand the scope of our survey to other research communities such as psychology and biomedical science.

Additional Reading

Our preprint, one more time:

Borghi, J. A., & Van Gulick, A. E. (2018). Data management and sharing in neuroimaging: Practices and perceptions of MRI researchers. bioRxiv.

For a primer on functional magnetic resonance imaging:

Soares, J. M., Magalhães, R., Moreira, P. S., Sousa, A., Ganz, E., Sampaio, A., … Sousa, N. (2016). A hitchhiker’s guide to functional magnetic resonance imaging. Frontiers in Neuroscience, 10, 1–35.

For more on rigor, reproducibility, and neuroimaging:

Nichols, T. E., Das, S., Eickhoff, S. B., Evans, A. C., Glatard, T., Hanke M., … Yeo, B. T. T. (2017). Best practices in data analysis and sharing in neuroimaging using MRI. Nature Neuroscience, 20(3), 299–303. (Preprint)
Poldrack, R. A., Baker, C. I., Durnez, J., Gorgolewski, K. J., Matthews, P. M., Munafò, M. R., … Yarkoni, T. (2017). Scanning the horizon: Towards transparent and reproducible neuroimaging research. Nature Reviews Neuroscience, 18(2), 115–126. (Preprint)