Skip to content

DataShare: A Plan to Increase Scientific Data Sharing

Posted in UC3

This post was co-authored by Dr. Michael Weiner, CIND director at UCSF

The DataShare project is a collaboration between University of California San Francisco’s Clinical and Translational Science Institute, the UCSF Library, and the UC Curation Center (UC3) at the California Digital Library.  The goal of the DataShare project is to achieve widespread voluntary sharing of scientific data at the time of publication.  This will be achieved by creating a data sharing website which could be used by all UCSF investigators, and ultimately by others in the UC system and other institutions.  Currently data sharing is mostly done by large, well funded multi-investigator projects.  There would be great benefit if much more raw data were widely shared, especially data from individual investigators.

we are the world
Imagine the possible scientific advances if we pooled our data the way that “We are the world” pooled celebrity voices. From live.drjays.com

 This project is the brainchild of Michael Weiner M.D., the director for the Center for Imaging of Neurodegenerative Diseases.  Weiner’s experience as the Principal Investigator of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) led him to conclude that widespread data sharing can be achieved now, with great scientific and economic benefits.  All ADNI raw data is immediately shared (at UCLA/LONI/ADNI) with all scientists in the world without embargo. The project is very successful: more than 300 publications and many more submitted.  This success demonstrates the feasibility and benefits of sharing data.

 Individual initiatives:

The  laboratory at the Center for Imaging of Neurodegenerative Diseases  began to share data at the time of publication in 2011. This included both raw data and a description of how the raw data was processed and analyzed, leading to the findings in the publication.  For the DataShare project, the following expansions to data sharing are planned:

  1. ADNI scientists will be encouraged to share the raw data of their ADNI papers, and other papers from their laboratories
  2. Other faculty in the Department of Radiology at UCSF and our collaborators in Neurology and Psychiatry at UCSF will be encouraged to share their raw data
  3. Chancellor, Deans, and Department Chairs at UCSF will be urged to make more widespread voluntary sharing of scientific data a UCSF priority/policy; this may include providing storage space for shared data and/or development of policies which would reward data sharing in the hiring and promotion process
  4. The example UCSF sets may then encourage the entire University of California system to implement similar changes
  5. Other collaborators and colleagues in other universities around the world will then be encouraged to adopt similar policies
  6. A “data sharing impact factor” will be developed and tested which will allow scientists to cite others’ data that they use and provide metrics for how others are using their data.

 Institutional initiatives:

The project seeks to encourage involvement by the National Institutes of Health (NIH), the National Science Foundation (NSF), and the National Library of Medicine (NLM), to promote and facilitate sharing of scientific data. This will be accomplished via five tasks:

  1. Encourage NIH and NSF to emphasize and expand their existing policies concerning data sharing and notify the scientific community of this greater emphasis
  2. Promote the establishment of a small group of committed individuals who can help formulate policy for NIH in this area, including a policy framework that favors open availability of scientific data.
  3. Establish technical mechanisms for data sharing, such as a national system for storage of all raw scientific data (e.g., a national data repository or data bank).  This repository may be created by NLM, or be housed at universities, foundations, or private companies (e.g., Dataverse).
  4. Work to develop incentives for scientists and institutions to share their raw data. This may include
    1. Requesting reports in non competitive reviews, competitive reviews and/or new applications
    2. Instructing the reviewers to consider data sharing in assessing priority scores in grant reviews
    3. Acknowledgment in publications
    4. Providing affordable access to infrastructure, i.e. software and media, which facilitates data sharing
    5. Encouraging NIH to provide funding for small grants aimed to promote and take advantage of shared data.  Examples include projects that utilize data mining or cloud computing.

The potential gains from widespread sharing of raw scientific data greatly outweigh the relatively small costs involved in developing the necessary infrastructure. Industries likely to benefit from increased accessibility of large amounts of raw data include the pharmaceutical and health care industry, chemistry, technology, engineering, etc. We also expect new technologies and new companies to develop to take advantage of newly available data.  Furthermore, there will be substantial societal benefits gained by widespread sharing of scientific data, primarily due to the ability to link data sets and repurpose data for making unforeseen discoveries.

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *