Since its founding, Dryad has hosted a researcher-led, open data publishing community and service. With the California Digital Library partnership in 2018, and reflecting on a decade of Dryad’s existence, we have spent time exploring what it means to remain a community-owned data publishing platform. By convening publishers, institutions, and other scholarly communications stakeholders to discuss the meaning of community-ownership, we have begun to understand how research-supporters see their role in the Dryad community and leadership. But to better understand the meaning of “researcher-led”, we wanted to hear about researchers’ perspectives on community-led open infrastructure.
With the support of a National Science Foundation Community Meeting grant (award #1839032), we hosted a meeting on October 4th, 2019, with folks from the founding Dryad research communities. Going back to our roots, gathering both researchers that founded Dryad as well as early career researchers in Ecology and Evolutionary Biology, we held a day-long event centered around asking a diverse group of researchers: what does it mean for Dryad to remain researcher-led?
Focusing on research perspectives
Kicking this off, we found it essential to hear from researchers themselves on how they use data, what their policies are, and their thoughts on how data re-use could be better suited to their use cases. Listening to researchers that are in different levels of their careers, we could see broad similarities but also meaningful variance in how even within the Ecology and Environmental Biology fields there are very different needs and uses for similar research data.
We explored these dynamics through a series of presentations. Ashley Asmus, a graduate student involved in the DroughtNet and NutNet projects explained the large amount of data they depend on across 27 countries, which could benefit from a more mature data management infrastructure. Dr. Lizzie Wolkovich introduced her lab’s new data management policy, requiring open sharing of data. And Dr. Karthik Ram, explained his perspective on what the data world could learn from the software world in terms of making things as easy as possible, with a bottom-up approach.
Dryad and the disciplinary repository landscape
Before diving into Dryad-specific discussions, we took time to have a large-format discussion with guests from BCO-DMO, a repository for Oceanographic data as well as folks from Arctic Data Center, both National Science Foundation funded discipline specific repositories. It was evident that researchers do not feel they have proper guidance on which repository to use, even when funders feel this piece is clearly stated. Beyond it being a mandate, it’s important for researchers to submit to these repositories as discipline specific repositories typically provide richer curation than multi-disciplinary “general” repositories. A heavy theme that emerged was how Dryad and others that are embedded in the article publishing processes could ensure submitted data are going to the right home.
Meeting user needs
Splitting the room based on user interests in submitting and publishing data or re-using data in Dryad, we turned the event space walls into post-it note exhibits. Researchers wrote down as many features and use cases they could think of for either submitting data or using data. Within their groups they then clustered and prioritized these features. Interestingly, the majority of participants chose to focus on data re-use, reflecting the change in open data acceptance amongst the community they represent. Some of the highest priority features in this arena were about integrations and development of software tools that make the curated data more usable. For those focusing on submission the top rated features were around crediting back to funders and institutions, as well as relations to the scripts and code used to analyze the data.
Maintaining a researcher-led community and platform
Circling back to the opening question we prompted the group to think about their perceptions of what it means for researchers to be leading the Dryad community. Many of these perspectives centered around transparency in marketing, true costs, and the added values. A big note was on how we can overcome barriers like those who do not have funding to publish data. Researchers raised the point that they may not be able to cover the cost of a data publishing charge, even at a respected US-based institution. Questions of how curation, integration, and open-source values can be inclusive of these communities struggling for funding prompted us to consider how disparate and diverse scientific research may be, even within the same domain. We received innovative ideas related to business models for supporting a broader audience of researchers as well as outreach ideas reflecting the need to integrate deeper within the open-source software community.
Working in conjunction with the open repositories (BCO-DMO, Arctic Data Center) and repository networks (DataONE) present at the workshop, and continuing to be led in the forms of governance and product management by researchers, Dryad and California Digital Library are striving to both understand and promote proper practices for community-ownership in open source data publishing. While this was a one-day event, we aim to continue to engage with broader research communities and encourage any researcher to get in touch with us if you have feedback or ideas for how you can get involved in our community.
CDL and Dryad thank the National Center for Ecological Analysis and Synthesis (NCEAS) for giving us the space to hold this meeting as well as the National Science Foundation for granting meeting funds.