Category: UC3

Posts written by UC3 staff.

RFI for organizational identifier registry

Organizations/institutions are a key part of the scholarly communications ecosystem. However, we lack an openly licensed, independently run organizational identifier standard to use for common affiliation and citation use cases.

To define a solution to this problem, a group of interested parties drafted and shared a proposal at last year’s PIDapalooza.  Based on that discussion, earlier this year Crossref, DataCite and ORCID announced the formation of an Organization Identifier Working Group and UC3 has supported this effort by our Director, John Chodacki, serving as chair of the Working Group.

Image Credit: ORCID

Scope of Work

The primary goal of our working group (loosely codenamed OrgID or Open PIIR – Open Persistent Institutional​ Identifier Registry) is to build a plan for how to best fill this gap and our main uses were to facilitate the disambiguation of researcher affiliations.

The working group used a series of breakout groups to refine the structure, principles, and technology specifications for an open, independent, non-profit organization identifier registry. We worked in three interdependent areas: Governance, Product Definition, and Business Model, and recently released for public comment our findings and recommendations for governance and product requirements.

Summary of findings & recommendations

After 9 months, the recommendations are the creation of an open, independent organization/institution identifier registry:

  • with capabilities for organizations/institutions to manage their own record,
  • seeded with and using open data,
  • overseen by an independent governance structure, and
  • incubated within a non-profit host organization/institution (providing technical development, operations and other support) during its initial start-up phase.

Request for Information

Our working group has now issued a Request for Information (RFI) to solicit comment and to hear from groups interested in hosting and/or developing this registry.

  • Are you interested in serving as a the start-up host organziation?
  • Do you have organization data you are willing to contribute?
  • Do you have other resources that could be helpful for the project?
  • Do you have advice, suggestions, and feedback on creating a sustainable business model for each phase of the Registry’s development?

We’d like to hear from you!  Please help spread the word!

Send your responses by 15 November, 2017

Before drafting responses, please also see our original A Way Forward document for additional framing principles. Also, please note that all responses will be reviewed by a subgroup of the Organization Identifier Working Group (that will exclude any RFI respondents).

If you have any other questions/comments, let us know:

The Significance of Managing Research Data

Some of the most influential research tools of the last century were created to ensure the quality of beer and extrapolate the results of agriculture experiments conducted in the English countryside. Though ostensibly about the placement of a decimal point, an ongoing debate about the application of these tools also provides a window for understanding what it actually means to manage research data.

The p-value: A very quick introduction

Though now ubiquitous in experiment-based research, statistical techniques for extending inferences from small sample (e.g. the participants in a research study) to larger populations are actually a relatively recent invention. The t-test, an early and still widely used example of “small sample” statistics was developed by William Sealy Gossett in the early 20th century as an economical way of ensuring the quality of stout. Several years later, while assisting with long-term experiments on wheat and grass at Rothamsted Experimental Station, Ronald Fisher would build on the work of Gosset and others to develop a statistical framework based around the idea of comparing observations to the null hypothesis- the position that there is no significant difference between two or more specified sets of observations.

In Fisher’s significance testing framework, devices like t-tests are tests of the null hypothesis. The results of these tests indicate the likelihood of observing a result when the null hypothesis is true. The logic is a little tricky, but the core idea is that these tests give researchers a way of understanding the likelihood that their data is the result of sampling or experimental error. In quantitative terms, this likelihood is known as a p-value. In his highly influential 1925 book, Statistical Methods for Research Workers, Fisher would introduce an informal threshold for rejecting the null hypothesis: p < 0.05.

In one of the most influential sentences in modern research methodology, Ronald Fisher describes p = 0.05 as a convenient point for judging the significance of a statistical test. From: Fisher, R.A. (1925). Statistical Methods for Research Workers.

Despite the vehement objections of all three, Fisher’s work would later be synthesized with that of statisticians Jerzy Neyman and Egon Pearson into a suite of tools that are still widely used in many fields of research. In practice, p < 0.05 has since become a one-size-fits-all indicator of success. For decades it has been acknowledged that work that meets this criterion is generally more likely to be reported in the scholarly literature while work that doesn’t is generally relegated the proverbial file drawer.

Beyond p < 0.05

The p < 0.05 threshold has become a flashpoint the ongoing conversation about research practices, reproducibility, and replicability. Heated conversations about the use and misuse of p-values have been ongoing for decades, but over the summer a group of 72 influential researchers proposed a seemingly simple step forward- change the threshold from 0.05 to 0.005. According to the authors, “Reducing the p-value threshold for claims of new discoveries to 0.005 is an actionable step that will immediately improve reproducibility.”.

As of this writing, two responses have been published. Both weigh the pros and cons of p < 0.005 and argue that the placement of a decimal point is less of a problem than the uncritical use of a single one-size-fits-all threshold across many different circumstances and fields of research. Both end on calls for greater transparency and stronger justifications for how decisions related to research design and statistical practice are made. If the initial paper proposed changing the answer from p < 0.05 to 0.005, both responses highlight the necessity of changing the question from one that is focused on statistics to one that incorporates research data management (RDM).

Ensuring that data can be used and evaluated in the future is one of the primary goals of RDM. For example, the RDM guide we’re developing does not have a space for assessing p-values. Instead, its focus is assessing and advancing practices related to planning for, saving, and documenting data and other research products. Such practices come with their own nuance, learning curves, and jargon, but are important elements to any effort to ensure that research decisions are transparent and justified.

Resources and Additional Reading

Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. J., Berk, R., … & Cesarini, D. (2017). Redefine statistical significance. Nature Human Behaviour. doi: 10.1038/s41562-017-0189-z

Lakens, D., Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A. J., Argamon, S. E., … Zwaan, R. A. (2017). Justify your alpha: A response to “Redefine statistical significance”PsyArxiv preprint. doi: 10.17605/OSF.IO/9S3Y6

McShane, B. B., Gal, D., Gelman, A., Robert, C., & Tackett, J. L. (2017). Abandon statistical significance. arXiv preprint. arXiv: 1709.07588.

Sterling, T. D. (1959). Publication decisions and their possible effects on inferences drawn from tests of significance—or vice versaJournal of the American Statistical Association54(285), 30-34. doi: 10.1080/01621459.1959.10501497

Rosenthal, R. (1979). The file drawer problem and tolerance for null resultsPsychological Bulletin86(3), 638-641. doi: 10.1037/0033-2909.86.3.638

Dat-in-the-Lab: Announcing UC3 research collaboration

We are excited to announce that the Gordon and Betty Moore Foundation has awarded a research grant to the California Digital Library and Code for Science & Society (CSS) for the Dat-in-the-Lab project to develop practical new techniques for effective data management in the academic research environment.


The project will pilot the use of CSS’s Dat system to streamline data preservation, publication, sharing, and reuse in two UC research laboratories: the Evolution: Ecology, Environment lab at UC Merced, focused on basic ecological and evolutionary research under the direction of Michael Dawson; and the Center for Watershed Sciences at UC Davis, dedicated to the interdisciplinary study of water challenges.  UC researchers are increasingly faced with demands for proactive and sustainable management of their research data with respect to funder mandates, publication requirements, institutional policies, and evolving norms of scholarly best practice.  With the support of the UC Davis and UC Merced Libraries, the project team will conduct a series of site visits to the two UC labs in order to create, deploy, evaluate, and refactor Dat-based data management solutions built for real-world data collection and management contexts, along with outreach and training materials that can be repurposed for wider UC or non-UC use.  

What is Dat?

The Dat system enables effective research data management (RDM) through continuous data versioning, efficient distribution and synchronization, and verified replication.  Dat lets researchers continue to work with the familiar paradigm of file folders and directories yet still have access to rich, robust, and cryptographically-secure peer-to-peer networking functions.   You can think of Dat as doing for data what Git has done for distributed source code control.  Details of how the system works are explained in the Dat whitepaper.

Project partners

Dat-in-the-Lab is the latest expression of CDL’s longstanding interest in supporting RDM at the University of California, and is complementary to other initiatives such as the DMPTool for data management planning, the Dash data publication service, and active collaboration with local campus-based RDM efforts.  CSS is a non-profit organization committed to improving access to research data for the public good, and works at the intersection of technology with science, journalism, and government to promote openness, transparency, and collaboration.  Dat-in-the-Lab activities will be coordinated by Max Ogden, CSS founder and director; Danielle Robinson, CSS scientific and partnerships director; and Stephen Abrams, associate director of the CDL’s UC Curation Center (UC3).

Learn more

Stay tuned for monthly updates on the project. You can bookmark Dat-in-the-Lab on GitHub for access to code, curricula, and other project outputs.  Also follow along as the project evolves on our roadmapchat with the project team, and keep up to date through the project Twitter feed.  For more information about UC3, contact us at and follow us on Twitter.

NSF EAGER Grant for Actionable DMPs

We’re delighted to announce that the California Digital Library has been awarded a 2-year NSF EAGER grant to support active, machine-actionable data management plans (DMPs). The vision is to convert DMPs from a compliance exercise based on static text documents into a key component of a networked research data management ecosystem that not only facilitates, but improves the research process for all stakeholders.

Machine-actionable “refers to information that is structured in a consistent way so that machines, or computers, can be programmed against the structure” (DDI definition). Through prototyping and pilot projects we will experiment with making DMPs machine-actionable.

Imagine if the information contained in a DMP could flow across other systems automatically (e.g., to populate faculty profiles, monitor grants, notify repositories of data in the pipeline) and reduce administrative burdens. What if DMPs were part of active research workflows, and served to connect researchers with tailored guidance and resources at appropriate points over the course of a project? The grant will enable us to extend ongoing work with researchers, institutions, data repositories, funders, and international organizations (e.g., Research Data Alliance, Force11) to define a vision of machine-actionable DMPs and explore this enhanced DMP future. Working with a broad coalition of stakeholders, we will implement, test, and refine machine-actionable DMP use cases. The work plan also involves outreach to domain-specific research communities (environmental science, biomedical science) and pilot projects with various partners (full proposal text).

Active DMP community

Building on our existing partnership with the Digital Curation Centre, we look forward to incorporating new collaborators and aligning our work with wider community efforts to create a future world of machine-actionable DMPs. We’re aware that many of you are already experimenting in this arena and are energized to connect the dots, share experiences, and help carry things forward. These next-generation DMPs are a key component in the globally networked research data management ecosystem. We also plan to provide a neutral forum (not tied to any particular tool or project or working group) to ground conversations and community efforts.

Follow the conversation @ActiveDMPs #ActiveDMPs and (forthcoming). You can also join the active, machine-actionable DMP community (live or remote participation) at the RDA plenary in Montreal and Force11 meeting in Berlin to contribute to next steps.

Contact us to get involved!

cross-posted from

Co-Author ORCiDs in Dash

Recently, the Dash team enabled ORCiD login. And while this configuration is important for primary authors, the Dash team feels strongly that all contributors to data publications should get credit for their work.

All co-authors of a published dataset now have the ability to authenticate and attach their ORCiD in Dash.

How this works:

  1. Data are published by a corresponding author who has the ability to authenticate their own ORCiD but they cannot enter other ORCiDs for co-authors. Bearing this in mind, Dash has a space for co-author email addresses to be entered.
  2. If email addresses are entered for co-authors, upon publication of the data, co-authors will receive an email notification. This notification will have a note about ORCiD iDs and a URL that directs to Dash.
  3. Co-authors who have clicked on this URL will be directed to a pop-up box over the dataset landing page which navigates authors to ORCiD for login and authentication
  4. After an ORCiD iD is entered and authenticated, the author is returned to the Dash landing page for their dataset and their ORCiD ID will appear by their name.


Managing the new NIH requirements for clinical trials

As part of an effort to enhance transparency in biomedical research, the National Institutes of Health (NIH) have, over the last few years, announced a series of policy changes related to clinical trials. Though there is still a great deal of uncertainty about which studies do and do not qualify, these changes may have significant consequences for researchers who may not necessarily consider their work to be clinical or part of a trial.

Last September, the NIH announced a series of requirements for studies that meet the agency’s revised and expanded definition of a clinical trials. Soon after, it was revealed that many of these requirements may apply to large swaths of NIH-funded behavioral, social science, and neuroscience research that, historically, have not been considered to be clinical in nature. This was affirmed several weeks ago when the agency released a list of case studies that included a brain imaging study in which healthy participants completed a memory task as an example of a clinical trial.


NIH’s revised and expanded definition of clinical trials includes many approaches to human subjects research that have historically been considered basic research. (Source)

What exactly constitutes a clinical trial now?

Because many investigators doing behavioral, social science, and neuroscience research consider their work to be basic research and not a part of a clinical trial, it is worth taking a step back to consider how NIH now defines the term.

According to the NIH, clinical trials are “studies involving human participants assigned to an intervention in which the study is designed to evaluate the effect(s) of the intervention on the participant and the effect being evaluated is a health-related biomedical or behavioral outcome.”, In an NIH context, intervention refers to “a manipulation of the subject or subject’s environment for the purpose of modifying one or more health-related biomedical or behavioral processes and/or endpoints.”. Because the agency considers all of the studies it funds that investigate biomedical or behavioral outcomes to be health-related, this definition includes mechanistic or exploratory work that does not have direct clinical implications.

Basically, if you are working on an NIH-funded study that involves biomedical or behavioral variables, you should be paying attention to the new requirements about clinical trials.

What do I need to do now that my study is considered a clinical trial?

If you think your work may be reclassified as a clinical trial, it’s probably worth getting a head start on meeting the new requirements. Here is some practical advice about getting started.


The new NIH requirements for clinical trials affect activity throughout the lifecycle of a research project. (Source)

Applying for Funding

NIH has specified new requirements about how research involving clinical trials can be funded. For example, NIH will soon require that any application involving a clinical trial be submitted in response to a funding opportunity announcement (FOA) or request for proposal (RFP) that explicitly states that it will accept a clinical trial. This means, that if you are a researcher whose work involves biomedical or behavioral measures, you may have to apply to funding mechanisms that your peers have argued are not necessarily optimal or appropriate. Get in touch with your program officer and watch this space.

Grant applications will also feature a new form that consolidates the human subjects and clinical trial information previously collected across multiple forms into one structured form. For a walkthrough of the new form, check out this video.

Human Subjects Training

Investigators involved in a clinical trial must complete Good Clinical Practice (GCP) training. GCP training addresses elements related to the design, conduct, and reporting of clinical trials and can be completed via a class or course, academic training program, or certification from a recognized clinical research professional organization.

In practice, if you have already completed human subjects training (e.g. via CITI) and believe your research may soon be classified as a clinical trials, you may want to get proactive about completing those couple additional modules.

Getting IRB Approval

Good news if you work on a multi-site study, NIH now expects that you will use a single Institutional Review Board (sIRB) for ethical review. This should help streamline the review process, since it will no longer be necessary to submit an application to each site’s individual IRB. This requirement also applies to studies that are not clinical trials.

Registration and Reporting

NIH-funded projects involving clinical trials must be registered on In practice, this means that the primary investigator or grant awardee is responsible for registering the trial no later than 21 days after the enrollment of the first participant and is required to submit results information no later than a year after the study’s completion date. Registration involves supplying a significant amount of information about a study’s planned design and participants while results reporting involves supplying information about the participants recruited, the data collected, and the statistical tests applied. For more information about, check out this paper.

If you believe your research may soon be reclassified as a clinical trial, now is probably a good time to take a hard look at how you and your lab handle research data management.The best way to relieve the administrative burden of these new requirements is to plan ahead and ensure that your materials are well organized, your data is securely saved, and your decisions are well documented. The more you think through how you’re going to manage your data and analyses now, the less you’ll have to scramble to get everything together when the report is due. If you haven’t already, now would be a good time to get in touch with the data management, scholarly communications, and research IT professionals at your institution.

Dash Enables ORCiD Login

The Dash team has now added a second way to login and submit. In addition to using Single Sign-On, users now have the ability to login with ORCiD. This means that not only can you authenticate with ORCiD, but once you have logged in this way, your ORCiD ID will connect to your Dash account. The next times that you submit to Dash, your ORCiD ID will auto populate in your submission form.

To back-up a little: ORCiD is a persistent identifier used to distinguish researchers from one another, and connect researchers with their research. If you are a researcher and do not currently have an ORCiD, sign up!

To connect your ORCiD:

  1. Login using the button on the far right of the Dash homepage
  2. Here you will see two options. If you click on the top ORCiD button will send you out to the ORCiD authentication page, and after correctly entering your ORCiD info, send you back to Dash.
    Screen Shot 2017-08-17 at 10.04.30 AM
  3. Although you have now successfully authenticated with ORCiD, to ensure you are connected to your correct submitting instance (a campus, a department, DataONE, etc…) you will be asked to choose your Single Sign-On. This is the only time you will be asked to login twice.Screen Shot 2017-08-17 at 10.14.22 AM
  4. After successfully logging in with Single Sign-On you will have your account connected to your ORCiD. In the future, you will not need to repeat this process and instead you will either be able to save your login to your browser or choose one of the two options for logging in.If you have already submitted to Dash before, you may logout, and go through the same steps above. This process will tie your ORCiD to your existing account and allow for either ORCiD or Single Sign-On in the future.

What We Talk About When We Talk About Reproducibility

At the very beginning of my career in research I conducted a study which involved asking college students to smile, frown, and then answer a series of questions about their emotional experience. This procedure was based on several classic studies which posited that, while feeling happy and sad makes people smile and frown, smiling and frowning also makes people feel happy and sad. After several frustrating months of trying and failing to get this to work, I ended my experiment with no significant results. At the time, I chalked up my lack of success to inexperience. But then, almost a decade later, a registered replication report of the original work also showed a lack of significant results and I was left to wonder if I had also been caught up in what’s come to be known as psychology’s reproducibility crisis.


Campbell’s Soup Cans (1962) by Andy Warhol. Created by replicating an existing object and then reproducing the process at least 32 times.

While I’ve since left the lab for the library, my work still often intersects with reproducibility. Earlier this year I attended a Research Transparency and Reproducibility Training session offered by the Berkeley Institute for Transparency in the Social Sciences (BITSS) and my projects involving brain imaging data, software, and research data management all invoke the term in some way.  Unfortunately, though it has always has been an important part of my professional activities, it isn’t always clear to me what we’re actually talking about when we talk about reproducibility.

The term “reproducibility” has been applied to efforts to enhance or ensure the research process for at at least 25 years. However, related conversations about how research is conducted, published, and interpreted have been ongoing for more than half a century. Ronald Fisher, who popularized the p-value that lies so central to many modern reproducibility efforts, summed up the situation in 1935.

“We may say that a phenomenon is experimentally demonstrable when we know how to conduct an experiment which will rarely fail to give us statistically significant results.”

Putting this seemingly simple statement into action has proven to be quite complex. Some reproducibility-related efforts are aimed at how researchers share their results, others are aimed at how they define statistical significance. There is now a burgeoning body of scholarship devoted to the topic. Even putting aside terms like HARKing, QRPs, and p-hacking, seemingly mundane objects like file drawers are imbued with particular meaning in the language of reproducibility.

So what actually is reproducibility?

Well… it’s complicated.

The best place to start might be the National Science Foundation, which defines reproducibility as “The ability of a researcher to duplicate the results of a prior study using the same materials and procedures used by the original investigator.”. According the NSF, reproducibility is one of three qualities that ensure research is robust. The other two, replicability and generalizability, are defined as “The ability of a researcher to duplicate the results of a prior study if the same procedures are followed but new data are collected.” and “Whether the results of a study apply in other contexts or populations that differ from the original one.” respectively. The difference between these terms is in the degree of separation from the original research, but all three converge on the quality of research. Good research is reproducible, replicable, and generalizable and , at least in the context of the NSF, a researcher invested in ensuring the reproducibility of their work would deposit their research materials and data in a manner and location where they could be accessed and used by others.

Unfortunately, defining reproducibility isn’t always so simple. For example, according to the NSF’s terminology, the various iterations of the Reproducibility Project are actually replicability projects (muddying the waters further, the Reproducibility Project: Psychology was preceded by the Many Labs Replication Project). However, the complexity of defining reproducibility is perhaps best illustrated by comparing the NSF definition to that of the National Institutes of Health.

Like the NSF, NIH invokes reproducibility in the context of addressing the quality of research. However, unlike the NSF, the NIH does not provide an explicit definition of the term. Instead NIH grant applicants are asked to address rigor and reproducibility across four areas of focus: scientific premise, scientific rigor (design), biological variables, and authentication. Unlike the definition supplied by the NSF, NIH’s conception of reproducibility appears to apply to an extremely broad set of circumstances and encompasses both replicability and generalizability. In the context of the NIH, a researcher invested in reproducibility must critically evaluate every aspect of their research program to ensure that any conclusions drawn from it are well supported.

Beyond the NSF and NIH, there have been numerous attempts to clarify what reproducibility actually means. For example, a paper out of the Meta-Research Innovation Center at Stanford (METRICS) distinguishes between “methods reproducibility”, “results reproducibility”, and “inferential reproducibility”. Methods and results reproducibility map onto the NSF definitions of reproducibility and replicability, while inferential reproducibility includes the NSF definition of generalizability and also the notion of different researchers reaching the same conclusion following reanalysis of the original study materials. Other approaches focus on methods by distinguishing between empirical, statistical, and computational reproducibility or specifying that replications can be direct or conceptual.

No really, what actually is reproducibility?

It’s everything.

The deeper we dive into defining “reproducibility”, the muddier the waters become. In some contexts, the term refers to very specific practices related to authenticating the results of a single experiment. In other contexts, it describes a range of interrelated issues related to how research is conducted, published, and interpreted. For this reason, I’ve started to move away from explicitly invoking the term when I talk to researchers. Instead, I’ve tried to frame my various research and outreach projects in terms of how they relate to fostering good research practice.

To me, “reproducibility” is about problems. Some of these problems are technical or methodological and will evolve with the development of new techniques and methods. Some of these problems are more systemic and necessitate taking a critical look at how research is disseminated, evaluated, and incentivized. But fostering good research practice is central to addressing all of these problems.

Especially in my current role, I am not particularly well equipped to speak to if a researcher should define statistical significance as p < 0.05, p < 0.005, or K > 3. What I am equipped to do is to help a researcher manage their research materials so they can be used, shared, and evaluated over time. It’s not that I think the term is not useful, but the problems conjured by reproducibility are so complex and context dependent that I’d rather just talk about solutions.

Resources for understanding reproducibility and improving research practice

Goodman A., Pepe A, Blocker A. W., Borgman C. L., Cranmer K., et al. (2014) Ten simple rules for the care and feeding of scientific data. PLOS Computational Biology 10(4): e1003542.

Ioannidis J. P. A. (2005) Why most published research findings are false. PLOS Medicine 2(8): e124.

Kitzes, J., Turek, D., & Deniz, F. (Eds.). (2017). The Practice of Reproducible Research: Case Studies and Lessons from the Data-Intensive Sciences. Oakland, CA: University of California Press.

Munafò, M. R., Nosek, B. A., Bishop, D. V., Button, K. S., Chambers, C. D., du Sert, N. P., et al. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1, 0021.

Wilson Gl, Bryan J., Cranston K., Kitzes J., Nederbragt L., et al. (2017) Good enough practices in scientific computing. PLOS Computational Biology 13(6): e1005510.

Dash: The Data Publication Tool for Researchers

This post has been crossposted on Medium

We all know that research data should be archived and shared. That’s why Dash was created, a Data Publishing platform free to UC researchers. Dash complies with journal and funder requirements, follows best practices, and is easy to use. In addition, new features are continuously being developed to better integrate with your research workflow.

Why is Dash the best solution for UC researchers:

  • Data are archived indefinitely. You can use Dash to ensure all of your research data will be available even after you get a new computer or switch institutions. Beyond that, your data will have all the important associated documentation on the funding sources for the research, the research methods and equipment used, and readme files on how your data was processed so future researchers from your own lab or globally can utilize your work.
  • Data can be published at any time. While we do have features that assist with affiliated article publication like keeping your data private during the review process, Data Publications do not need to be associated with an article. Publish out your data at any point in time.
  • Data can be versioned. As you update and optimize protocols, or do further analysis on your data, you may update your data files or documentation. Your DOI will always resolve to a landing page listing all versions of the dataset.
  • Data can be uploaded to Dash directly from your computer or through a “manifest”. “Manifest” means you may enter up to 1000 URLs where your data are living on servers, box, dropbox, or google drive and the data will be transferred to Dash without waiting several hours or dealing with timeouts.
  • You can upload up to 100gb of data per submission.
  • Dash does not limit file type. So long as the data are within the size limits listed above, publications can be image data, tabular data, qualitative data, etc…
  • Related works can be linked. Code, articles, other datasets, and protocols can be linked to your data for a more comprehensive package of your research.
  • Data deposited to Dash receive a DOI. This means that not only can your data be located but you can cite your data as you would articles. The landing page for each dataset includes an author list for your citation as well, so each author who contributed to the data collection and analysis may receive credit for their work.
  • Data are assigned an open license. Data deposited are publicly available for re-use to anyone using a Creative Commons license. You put many hours and coffees into producing these data, public release will give your research a broader reach. A light reminder that your name are still associated with data and making your data public does not mean you are “giving away” your work.
  • Dash is a UC project. Dash can be customized per campus. Many campus libraries are subsidizing the cost of storage, and it is developed by University of California Curation Center (UC3) meaning this service is set-up to serve your needs.

We hear a lot about the cost of storage being an inhibitor. But, on many campuses, the storage costs associated with Dash are subsidized by academic libraries or departments. The cost of storage could also be written into grants (as funders do require data to be archived).

We are always looking for feedback on what features would be the most useful, so that we can make data publishing a part of your normal workflows. Get in touch with us or start using Dash to archive and share your data.

From Brain Blobs to Research Data Management

If you spend some time browsing the science section of a publication like the New York Times you’ll likely run across an image that looks something like the one below: A cross section of a brain covered in colored blobs. These images are often used to visualize the results of studies using a technique called functional magnetic resonance imaging (fMRI), a non-invasive method for measuring brain activity (or, more accurately, a correlate of brain activity) over time. Researchers who use fMRI are often interested in measuring the activity associated with a particular mental process or clinical condition.


A visualization of the results of an fMRI study. These images are neat to look at but not particularly useful without information the underlying data and analysis.

Because of the size and complexity of the datasets involved, research data management (RDM) is incredibly important in fMRI research. In addition to the brain images, a typical fMRI study involves the collection of questionnaire data, behavioral measures, and sensitive medical information. Analyzing all this data often requires the development of custom code or scripts. This analysis is also iterative and cumulative, meaning that a researcher’s decisions at each step along the way can have significant effects on both the subsequent steps and what is ultimately reported in a presentation, poster, or journal article. Those blobby brain images may look cool, but they aren’t particularly useful in the absence of information about the underlying data and analyses.

In terms of both the financial investment and researcher hours involved, fMRI research is quite expensive. Throughout fMRI’s relatively short history, data sharing has been proposed multiple times times as a method for maximizing the value of individual datasets and for overcoming the field’s ongoing methodological issues. Unfortunately, a very practical issue has hampered efforts to foster the open sharing of fMRI data- researchers have historically organized, documented, and saved their data (and code) in very different ways.

What we are doing and why

Recently, following concerns about sub-optimal statistical practices and long-standing software errors, fMRI researchers have begun to cohere around a set of standards regarding how data should be collected, analyzed, and reported. From a research data management perspective, it’s also very exciting to see that there is also an emerging standard regarding how data should be organized and described. But, even with these emerging standards, our understanding of the data-related practices actually employed by fMRI in the lab and how those practices relate to data sharing and other open science-related activities remains mostly anecdotal.

To help fill this knowledge gap and hopefully advance some best practices related to data management and sharing, Dr. Ana Van Gulick and I are conducting a survey of fMRI researchers. Developed in consultation with members of the open and reproducible neuroscience communities, our survey asks researchers about their own data-related practices, how they view the field as a whole, their interactions with RDM service providers, and the degree to which they’ve embraced developments like registrations and pre-prints. Our hope is that our results will be useful for both the community of researchers who use fMRI but and for data service providers looking to engage with researchers on their own terms.

If you are a researcher who uses fMRI and would like to complete our survey, please follow this link. We estimate that the survey should take between 10 and 20 minutes.

If you are a data service provider and would like to chat with us about what we’re doing and why, please feel free to either leave a comment or contact me directly.