Skip to main content

(index page)

Research Data Publishing Redefined

By: Daniella Lowenberg 

In considering the title for this post, I struggled to narrow down the range of activities that I work on to a specific name and have landed on research data publishing (which is timely now four years after my first blog for UC3 defining data publishing). The reason this is difficult is that while data publishing may be a relatively new topic in the last decade, it is often tied specifically to depositing data in a repository. But the activities that I believe define data publishing are: 

In short: it’s not one component (e.g., managing a repository). Specializing in data publishing means focusing on and leading the development of communities of practice around open standards, open infrastructure, and easily understood and accessible workflows for researchers. 

So, what’s the vision for 2021? Well, adoption of course. But in the following areas:

Seamless Connections 

A big area that we’ve invested in is our partnership with Dryad. When considering adoption for Dryad, we are referring to both the researcher and research supporting communities. Seamlessly connecting researchers and research supporters is essential for our globally shared goals to make open data a more commonly accepted and well done practice. A lot of this requires education (shout out to UC’s Love Data Week) but as Product Manager for Dryad I am also thinking through and prioritizing how services and workflows can be educational and as easy as possible. Our iterative product strategy is focused on collaborations and integrations that will make data publishing more seamless and better connected to other components of the research process. We are looking forward to launching our integrations with Zenodo for software, eJournalPress and Editorial Manager for journals, and Frictionless Data for increased quality of our datasets. In terms of the research supporting communities, we are working to build better connections between the growing funders, publishers, and institutions that have long supported or recently joined the Dryad community. 

Open Data Metrics

It’s become increasingly clear that we need a way to evaluate the reach and re-use of openly published research data. The Make Data Count initiative is continuing to build the social and technical infrastructure for open data metrics. Beginning this year, a team of bibliometricians at University of Ottawa and ZBW are initiating qualitative and quantitative studies on data citation and re-use across various scientific disciplines that will influence the development of proper indicators. We are also beginning to map services that will be broadly accessible for repositories to standardize and aggregate data usage and citation. 

Daniella’s book club recommendation —  Open Data Metrics: Lighting the Fire.

Research Data Publishing Ethics

Coming from the journal publishing side of open research, I joined the UC3 team wondering about publication ethics and how to best position our data publishing initiatives, sensitive to the various issues that can arise with research publication. Spearheading a new FORCE11 Working Group, we are proud to launch the Research Data Publishing Ethics WG. This group, that informally began a year ago, with folks from agencies, repositories, publishers, preprint advocates, and research integrity experts, will develop community norms and proposed workflows for data publishers to consider (e.g., publishing identifying information, considering legal standards across countries, authorship disputes). Please join if you have interest in contributing to the development of these standards or would like to follow the conversations! 

While these are a few highlighted areas that we will be focused on in the new year, we are always interested in collaborating or generating new ideas around research data publishing and how to best support researchers in the advancement of their discoveries. If we have learned anything in COVID times, we know that this space is essential. 

This blog is a part of the “A Peek Into 2021 for UC3″ series

 

Open Data Metrics: Lighting the Fire

CDL has been working to promote responsible development and use of data metrics through the Make Data Count initiative and other community projects. Daniella Lowenberg and John Chodacki recently published a book capturing their thoughts on the matter. Check it out!

http://opendatametrics.org

Research data is at the center of science, and to date it has been difficult to understand its impact. To assess the reach of open data, and to advance data-driven discovery, the research and research supporting communities need open, trusted data metrics.

In Open Data Metrics: Lighting the Fire, the authors propose a path forward for the development of data metrics. They acknowledge historic players and milestones in the process and demonstrate the need for standardized, transparent, community-led approaches to establish open data metrics as the new normal.

NSF Workshop Overview: Focusing on Researcher Perspectives

Since its founding, Dryad has hosted a researcher-led, open data publishing community and service. With the California Digital Library partnership in 2018, and reflecting on a decade of Dryad’s existence, we have spent time exploring what it means to remain a community-owned data publishing platform. By convening publishers, institutions, and other scholarly communications stakeholders to discuss the meaning of community-ownership, we have begun to understand how research-supporters see their role in the Dryad community and leadership. But to better understand the meaning of “researcher-led”, we wanted to hear about researchers’ perspectives on community-led open infrastructure. 

With the support of a National Science Foundation Community Meeting grant (award #1839032), we hosted a meeting  on October 4th, 2019, with folks from the founding Dryad research communities. Going back to our roots, gathering both researchers that founded Dryad as well as early career researchers in Ecology and Evolutionary Biology, we held a day-long event centered around asking a diverse group of researchers: what does it mean for Dryad to remain researcher-led?

Focusing on research perspectives 

Kicking this off, we found it essential to hear from researchers themselves on how they use data, what their policies are, and their thoughts on how data re-use could be better suited to their use cases. Listening to researchers that are in different levels of their careers, we could see broad similarities but also meaningful variance in how even within the Ecology and Environmental Biology fields there are very different needs and uses for similar research data. 

We explored these dynamics through a series of presentations.  Ashley Asmus, a graduate student involved in the DroughtNet and NutNet projects explained the large amount of data they depend on across 27 countries, which could benefit from a more mature data management infrastructure. Dr. Lizzie Wolkovich introduced her lab’s new data management policy, requiring open sharing of data. And Dr. Karthik Ram, explained his perspective on what the data world could learn from the software world in terms of making things as easy as possible, with a bottom-up approach.

Dr. Karthik Ram presenting on his experience working with open source software

Dryad and the disciplinary repository landscape

Before diving into Dryad-specific discussions, we took time to have a large-format discussion with guests from BCO-DMO, a repository for Oceanographic data as well as folks from Arctic Data Center, both National Science Foundation funded discipline specific repositories. It was evident that researchers do not feel they have proper guidance on which repository to use, even when funders feel this piece is clearly stated. Beyond it being a mandate, it’s important for researchers to submit to these repositories as discipline specific repositories typically provide richer curation than multi-disciplinary “general” repositories. A heavy theme that emerged was how Dryad and others that are embedded in the article publishing processes could ensure submitted data are going to the right home.

Meeting user needs

Splitting the room based on user interests in submitting and publishing data or re-using data in Dryad, we turned the event space walls into post-it note exhibits. Researchers wrote down as many features and use cases they could think of for either submitting data or using data. Within their groups they then clustered and prioritized these features. Interestingly, the majority of participants chose to focus on data re-use, reflecting the change in open data acceptance amongst the community they represent. Some of the highest priority features in this arena were about integrations and development of software tools that make the curated data more usable. For those focusing on submission the top rated features were around crediting back to funders and institutions, as well as relations to the scripts and code used to analyze the data.

Dr. Sally Otto representing the “Publishing Data” group discussion
Researchers clustering and prioritizing data re-use features

Maintaining a researcher-led community and platform

Circling back to the opening question we prompted the group to think about their perceptions of what it means for researchers to be leading the Dryad community. Many of these perspectives centered around transparency in marketing, true costs, and the added values. A big note was on how we can overcome barriers like those who do not have funding to publish data. Researchers raised the point that they may not be able to cover the cost of a data publishing charge, even at a respected US-based institution. Questions of how curation, integration, and open-source values can be inclusive of these communities struggling for funding prompted us to consider how disparate and diverse scientific research may be, even within the same domain. We received innovative ideas related to business models for supporting a broader audience of researchers as well as outreach ideas reflecting the need to integrate deeper within the open-source software community.

Working in conjunction with the open repositories (BCO-DMO, Arctic Data Center) and repository networks (DataONE) present at the workshop, and continuing to be led in the forms of governance and product management by researchers, Dryad and California Digital Library are striving to both understand and promote proper practices for community-ownership in open source data publishing. While this was a one-day event, we aim to continue to engage with broader research communities and encourage any researcher to get in touch with us if you have feedback or ideas for how you can get involved in our community.

CDL and Dryad thank the National Center for Ecological Analysis and Synthesis (NCEAS) for giving us the space to hold this meeting as well as the National Science Foundation for granting meeting funds.

We Can’t Succeed Alone

Within the realm of research data management, libraries spend resources building and providing tools that are not within researcher workflows and/or are not aligned with researcher values.  By doing this, we are setting ourselves up for failure.

As mentioned in previous posts, part of my work is focused on incentivizing UC researchers to publish their data. And it has taught me that we are failing as an institutional and research stakeholder community to move the needle on the adoption of data publishing. Why? Because we are not listening to researchers. To get to the root of this issue, I have spent hours working with researchers to determine which features would get them to deposit, what we could do differently to get them to change their processes, etc.  Unsurprisingly, a common response has been that integrations within existing workflows (ones that researchers are already familiar with or those driven by the research process) would be key.

Why is adoption key?

You may ask what I mean by “adoption of data publishing” and why it is so key. Of course, there are natural answers – we want our tools and services to be used. But for me, the adoption we need to be focused on goes deeper to mean more than just awareness or usage of tools. It should mean that researchers are publishing their data and valuing their data as a first-class research output. Of course article publishing has its flaws that we would like to avoid repeating, but researchers value articles like currency. Research data are the science underlying those articles and they deserve to be valued, as well.

Adoption of data publishing also means quality, curated datasets have understandable metadata specific to the dataset and preservation assurance that the data will persist. These are features that we as institutions value. These are the features that institutions include in their own repositories. These value points are why we would not consider a PDF of an Excel sheet or a copy of summary statistics a data publication. Successful adoption means a culture change where researchers are publishing their understandable and usable research data (employing institutional and community best practices), citing their data, and valuing their data publications like their articles.

We are nowhere near this type of success.  However, it is essential that we continue to work together to drive this kind of adoption. If we do not work together we will continue to build resources full of subsets of these features, maybe meeting our institutional needs but that that are not adopted by researchers. And if we fail at this adoption campaign, we fail to call the open data movement a success.

Powerful adoption goals

With this view of adoption as my guide, I had our Dash dev team build a robust API that could handle different types of integrations that researchers need. I went to publishers, online lab notebooks, computing spaces, and showed them how easy it would be to integrate with our tools. I was trying to break this adoption logjam and it was my hope that acting on behalf of UC, the largest research institution in the world, I would be able to deliver on our researcher’s feedback.  I set goals for new deposits in the thousands and thought we had the answers that could get us to meet those targets. But what I found is another hurdle: that not even UC has the scale for an outside vendor to be interested in integrating with. And after one year of trying, we were nowhere near reaching our adoption goals.

CDL is not alone. At CNI last week I asked the room of institutional stakeholders if any of their institutional platforms had more than 500 deposits. Not one of the 40+ people in the room raised their hand. This begs the question: If we are spending years of time for a couple hundred deposits, how can any of our institutions call this a success? All of us at institutions need to be self-reflective and evaluate ourselves against realistic success metrics. CDL has been in this boat. While my team did not  want to hear that we had failed to meet our goals, after years of building an awesome platform for the UCs, it was the truth.

So we took a step back.  We re-evaluated the feedback from our UC researchers and regrouped on whether we should put energy into a project destined for minimal success. The new question we asked ourselves: How can we continue to support our institutional values for research data publishing and get to a scale we want?  

With truth acknowledgement brings new ventures

We spent a lot of time rethinking our motivations as libraries and as research institutions: we want high quality datasets, we want the most PID-ified deposits possible, we want scale, we want ease of use, we want integrations, etc. We also spent time thinking about the motivations of our researchers: they want ease of use and low friction with piece of mind that they are meeting requirements and doing the right thing.  

Along the journey we concluded that serving just one institutional community was not a plausible way to drive adoption of data publishing, and my team began to look at successes in our wider community. We did a repository comparison based on features and values. What became extremely clear was that Dryad was and is clearly aligned what we were trying to achieve. Not only does Dryad have a similar mission statement, but Dryad has an undisputable amount of adoption and researcher support.  They are part of the researcher workflow and they are focused on high-quality, curated datasets. These are all the things we all want, right?

As a matter of fact, when researchers at UC had said “your UC specific dash platform sounds cool but can I just put my data in Dryad like my collaborators?”, I would say yes, because publishing data in Dryad is better than fighting over data territory. But as we were looking for ways of achieving the scale and success that Dryad had achieved, we realized that what was needed was us to better support Dryad as part of our institutional solution.  So, after months of vetting and discussions on both sides, CDL decided to partner with Dryad. And, in this partnership, I get to walk the walk. Dryad’s submissions are on the rise and researchers have long supported Dryad.

Building for success

What we were missing was relevance and scale. What Dryad was missing was enhanced and innovative features to optimize their existing connections into researcher workflows. But as we embark on our new Dryad service we need to be reminded that the other differentiation of Dryad is that it works for researchers.  And we need to remember that we need scale…so join us!

While we plan to leverage the new partnership to help UC and all member institutions to grow adoption, we can’t build solely for our library and scholarly communications needs. Key word here is solely. Of course we as a library community should proudly lead the way in bringing better metadata, metrics, and infrastructure to support the discoverability, and use, and preservation of research outputs to the table. But if our end goal is to support researchers we cannot prioritize our build out of services in a researcher-less silo.

The new Dryad

This means that the new Dryad should be able to connect into institutions like UC, reflecting their values, while remaining a researcher-focused service. As product manager, my goal for this new Dryad service is:

  1. To focus all of our energy on adoption
  2. To build out each feature to be user centered and tested
  3. To be a seamless data publishing platform, integrated into research and publishing workflows
  4. To add institution tools in ways that are transparent to and benefit the researcher

What does this all require? I need to ensure that we are on the pulse of publisher data policy workflows, integrating with computing spaces and notebooks, and continually in alignment with funder requirements (and thinking ahead to integrations with preprints and other funder required spaces). This is where our library and scholarly communications expertise comes in. If we can build up an open community of support to ensure we are only building for best practices, we can ensure that our features and services are two fold: instilled by our values and displayed in the easiest way for submitters.

Vision for the community 

So let’s build an open community. A community of supporters who would like to focus strictly on researcher adoption of best data practices. No jargon. No discussions about back-end technology. A community that leverages each others projects that are aligned in values. A community that embraces diverse set of tools to help grow adoption.  

Let’s follow the principles of the Supporters Guide. It is possible for us to all work together, regardless of technologies, to ensure that our services and offerings in the research supporting community is bound by our values and built for researcher use. More on that in a future post.

How To Link Dash Data Publications With Your ORCiD Profile

Dash, the data publishing platform, is integrated with ORCiD, an author disambiguation service in a couple of different ways: you can login to Dash with ORCiD, and you (and co-authors) can display your ORCiD on dataset landing pages.

But, let’s take a step back. 

What is an ORCiD? It is a unique identifier for you as a researcher.  Increasingly funders, publishers, and institutions are requiring ORCiD iDs as a way to identify researchers and track research output. If you’re submitting articles (to journals), or other research output like data and code, take a minute to get yourself an ORCiD and connect all of your research output!

What is the benefit?

ORCiD is an identifier you can use for article and data publishing workflows, and it is also a public profile of your research work. It is a great way to display and track all of your research work.

How does this all relate to Dash?

As mentioned above Dash is integrated with ORCiD for login and credit purposes. But, for ORCiD to properly display datasets submitted to Dash, it is necessary that you get a DataCite profile.

A quick step to ensure your data publications automatically appear on your ORCiD profile

DataCite mints the Dash DOI (which can be used for access and citation of your dataset) and following this simple DataCite guide you can grant permissions for DataCite to send your dataset DOI information back to ORCiD. After you have adjusted your permissions to allow for this, anywhere that you submit (other data repositories) that utilize DataCite will begin displaying on your ORCiD profile just like an article.

 

Dash Releases First Submission REST API

Over the last year the Dash team has spent time surveying the community on incentives for and ways to drive adoption of data publishing practices. Lots of barriers have been around the ease of submitting data and that data publishing is outside of the status quo research workflows. To try and aid with this, the Dash team has implemented our first Submission REST API. Our hope is that this is the first step towards opening up integration opportunities with electronic lab notebooks and publishers, and allow for research data to be submitted in analytical environments. The first release of our API allows for a user to publish a new dataset or version an existing dataset with metadata and receive a citable DOI. By implementing versioning features, users are now able to update data dynamically.

To get started, check out our technical “How-To” guide. If you have any questions, feedback or would like to discuss integrations, please get in touch at uc3@ucop.edu.

From Networking to Curation: Summing Up the 2018 Data Curation Unconference

Authors: Vessela Ensberg (UC Davis), Jeanine Finn (Claremont Colleges), Greg Janée (UC Santa Barbara/California Digital Library), Amy Neeser (UC Berkeley), Scott Peterson (UC Berkeley)

The data curation unconference in 2018 took place prior to the UCDLFx meeting at UC Riverside. Nearly 30 attendees representing 11 institutions signed up to attend the event. In the course of three hours we completed two sessions and discussed seven problem statements. Topics ranged from building relationships to exchange of hands-on project experience. Below we highlight shared valuable experience and steps for moving forward in addressing these challenges.

How do you form long-term relationships with units outside the library? What do you propose when you initiate the conversation? What kind of involvement do you have in their work and vise-versa?

The key to establishing these relationships is to identify units with a similar research support mission.  These relationships tend to be between the library and IT units, Grant Support or the Office of Research, educational technology centers, digital learning centers, campus learning groups, student makerspace groups or library student advisory groups. Some examples of collaborators are BIDS and DLab at UC Berkeley and CRESP at UC Riverside.

Meeting with these groups can identify any gaps in service.  Collaboration between departments can also create a pool of consultants, wherein library staff will contribute  their time and expertise to a growing pool of experts that can be used for consultations with various researchers.  Another area for collaboration is co-teaching in different instruction areas. Finally, there are the opportunities for joint events. UC Berkeley holds consulting summits twice a year for all consultants from different departments to get together and talk in a semi-organized format that sometimes resembles an unconference, but can also be more focused with reports given from various working groups.. Examples of presented projects could be  implementing docker in an instructional setting or how to acquire datasets and make them available.

Another approach to strengthening relationships with external units is to invest jointly in resources or positions. Due to the complexities associated with such collaborations, the parties may want to start with a proof of concept or a pilot project for a tool, or by hiring a limited-term position such as a CLIR Fellow.

In all approaches, it is important to  communicate clearly the goals and how to assess the outcomes to everyone involved. In other words, the group agrees on what success looks like. A steering committee of stakeholders can be useful to identify the correct goals. These agreements should be documented, sometimes as a MOU. It is important to keep in mind that sustainability is key, and relationships require maintenance.

How do you work with non-traditional data for archiving? e.g., relational databases and digital humanities products?

There are three types of challenges in archiving non traditional products: communication, resources and need for workflows for novel digital research output. In additional to requiring the usual technological and human resources investment to process the materials, there are accommodations to be made for usability. For example, users expect to be able to stream video or to be able to search a database, while process of archiving the database requires flattening it. The challenge is only going to grow as more faculty are engaged in Digital Humanities, which poses the question of how their research output is going to be archived once they retire.

There are some solutions available to UC users. For instance, eScholarship now allows video streaming. Still, we need to meet the need of a having storage and access to the dissemination information package that is being actively used and retains many relevant functionalities in addition the archival information package that is rarely accessed. Ideally, we will see further integration of existing platforms (Dash and eScholarship) to enhance discovery. Perhaps a UC network can provide the infrastructure necessary to take the archival service a step further.

Have you assisted faculty by doing hands-on work with their data?

One paradigm for how libraries can interact with researchers in the area of data curation is the library providing consultation (only), e.g., giving advice on data management plans and repository selection.  Another paradigm is the library acquiring (and assuming ownership of) the researcher’s data, and turning it into a library collection. Is there a middle ground? Are there paradigms in which the library plays a more active role in the handling and processing of researcher data?

The consensus that emerged out of a discussion of our collective experiences is that libraries generally do not perform hands-on work with researcher data.  To the extent that librarians have worked with data, the focus has been strictly on pre-ingest, higher-level review. There are a variety of reasons for this, including limited resources and sustainability—no surprises there—but also the degree of faculty interest and the need for sufficient discipline knowledge.  Some institutions (University of Michigan was noted) have policies that explicitly state (and limit) the degree of librarian involvement in faculty research.

It is in the area of metadata that libraries have played a much more hands-on role.  There is ample precedent for librarians assisting with metadata preparation and review, particularly for the metadata backing dataset landing pages, since it is the metadata that directly supports library discovery services.

Ultimately, it was agreed that “success looks like researchers having the skills to curate their own data.”

How do we make publicly available data discoverable, and/or integrate into websites, help people find them?

There are two potential users who need to discover data: the casual user who is browsing and the researcher looking for deep datasets. Since users don’t think of the library catalog when they are looking for data, we need to utilize other means of communicating dataset locations. For example, research is often done on the causes and consequences of current events. With that need in mind UC Davis’ Michele Tobias  communicated about generating a set of maps for the boundaries of American Viticulture Areas via a blog post. The blog was discovered by a journalist, who followed up on the dataset. Even though the dataset was not chosen to be featured in a later publication, it demonstrates how writing a story that connects events with datasets in order to support research assists their discovery. In a similar vein, clearly linking datasets with articles will assist their discovery for research.

It is also important to also keep in mind how our patrons search. Since many users start with Google, it is important that dataset metadata is discoverable that way, for example through schema.org for datasets or DataCite.

How do we better educate and advocate for data curation services with researchers?

The first step for successful advocacy is building relationships, following the discussion outlined in the beginning of this blog. In addition to the Office of Research and IT, Management Service Offices (MSO) were identified as important partners assisting with outreach. We should seek to make it easier for our partners to communicate with their network about our services, and provide them with a clear and eloquent message about what we offer.The message used for outreach needs to emphasize the free-of-charge services that sets libraries apart and frame the library as a resource for data services. The services are designed to make the researchers work easier.  

Successfully assisting researchers provides word-of-the-mouth marketing that is very powerful. Examples of success stories of assisting researchers and raising the data services profile among them were the file transfer service at UCSD, providing training in data skills and using training to promote services and exploring the consequences of the claim that research data belongs to the UC Regents. Personal experience that results in a researcher vouching for the library results in a persuasive and impactful message.

What would a UC expertise consortia for RDM/curation look like?

We combined a long-term vision and a pragmatic approach in tackling this question. The Data Curation Network provided an inspiring example of how subject specific expertise can be exchanged across partnering institutions. After Lisa Johnson’s presentation for the CKG Deep Dive in September 2017, this collaborative model was probably on the minds of many participants. To move towards achieving this long-term vision, we proposed two actions to move forward: exchange of educational materials and catalog of expertise.

For the educational materials, including curriculum and training, we can use the CKG Google Drive folder. To use those effectively, we will also need a catalog and shared definitions. We can also share workflows in the same manner. Similarly, we wanted to have a catalog of skills relevant to working with research data and find out who in the Library possesses them. After some discussion we settled on undertaking this step within the members of the CKG. We will develop a survey instrument and apply for IRB approval to distribute it to CKG members and ask them to describe their skill set and the skillset of their team. Going forward, we envision presenting a larger project proposal to DOC.

How do we engage students and involve them?

There is no simple answer here. Participants discussed a variety of experiences that have met with some success in developing student engagement with data and data curation practices. The key theme was that it is necessary for library data services advocates to *both* welcome students into library spaces to take advantage of services, as well as go out to meet the students were they are. Students, graduates and undergraduates alike, are producing and publishing their research in greater numbers. We need to find ways to engage with them in the venues where they are already working and sharing their research products. For example, we discussed attending research poster presentations and asking authors about their funding plans as a way of introducing the library’s services for data management planning. Additionally, library programs that are not at first glance “data centric” can provide a gateway to a larger conversation about data management and preservations. Examples of these activities included workshops for establishing a scholarly identity/setting up an ORCID, developing OA educational materials, and learning how to use citation manager software. Connections with student organizations (like the graduate student organization) can also be beneficial in maintaining connections as students graduate.

We Are Talking Loudly and No One Is Listening

By Daniella Lowenberg

“Listening is not merely not talking, though even that is beyond most of our powers; it means taking a vigorous, human interest in what is being told to us” — Alice Duer Miller

A couple of months ago I wrote about how we need to be advocating for data sharing and data management with more focus on adoption and eliminate discussions about technical backends. I thought this was the key to advocating for getting researchers to change their practices and make data available as part of their normal routines. But, there’s more than just not arguing over platforms that we need to change — we need to listen.

We are talking loudly and saying nothing.

I routinely visit campuses to lead workshops on data publishing (as train the trainers style for librarians and for researchers). Regardless of the material presented, there are always two different conversations happening in the room. At each session, librarians pose technical questions about backend technologies and integrations with scholarly publishing tools (i.e. ORCiD). These are great questions for a scholarly publishing conference but confusing for researchers. This is how workshops start:

Daniella “Who knows what Open Access is?”

<50% researchers in room raised hands

Daniella “Has anyone here been asked to share their data or understand what this means?”

<20% researchers in room raised hands

Daniella “Does anyone here know what an ORCiD is or have one?”

1 person total raised their hand

We are talking too loudly and no one is listening.

We have characterized ‘Open Data’ as successful because we have incentives, and authors write data statements, but this misconception has allowed the library community to focus on scholarly communications infrastructure instead of continuing to work on the issue at hand: sharing research data is not well understood, incentivized, or accessible. We need to focus our efforts on listening to the research community about what their processes are and how data sharing could be a part of these, and then we need to take this as guidance in advocating for our library resources to be a part of lab norms.

We need to be focusing our efforts on education around HOW to organize, manage, and publish data.

Change will come when organizing data to be shared throughout the research process is a norm. Our goal should be to grow adoption of sharing and managing data and as a result see an increase in researchers knowing how to organize and publish data. Less talk about why data should be available, and more hands-on getting research data into repositories, in accessible and researcher-desirable ways.

We need to only build tools that researchers WANT.

The library community has lots of ideas about what is a priority right now in the data world such as curation, data collections, and badges, but we are getting ahead of ourselves. While these initiatives may be shinier and more exciting, it feels like we are polishing marathon trophies before runners can finish a 1 mile jog. And we’re not doing a good job understanding their perspectives on running in the first place.

Before we can convince researchers that they should care about library curation and ‘FAIR’ data, we need to get researchers to even think about managing data and data publishing as a normal activity in research activities. This begins with organization at the lab level and figuring out ways to integrate data publishing systems into lab practice without disrupting normal activity. When researchers are concerned about finishing their experiments, publishing, and their career, it is not an effective or helpful solution to just name platforms they should be using. It is effective to find ways to relieve publishing pain points, and make the process easier. Tools and services for researchers should be understood as ways to make their research and publishing processes easier.

“When you listen, it’s amazing what you can learn. When you act on what you’ve learned, it’s amazing what you can change.” — Audrey McLaughlin

Librarians: this is a space where you can make an impact. Be the translators. Listen to what the researchers want, understand the research day-to-day, and translate to the infrastructure and policy makers what would be effective tools and incentives. If we focus less time and resources on building tools, services, and guides that will never be utilized or appreciated, we can be effective in our jobs by re-focusing on the needs of the research community as requested by the research community. Let’s act like scientists and build evidence-based conclusions and tools. The first step is to engage with the research community in a way that allows us to gather the evidence. And if we do that, maybe we could start translating to an audience that wants to learn the scholarly communication tools and language and we could each achieve our goals of making research available, usable, and stable.

Dash: 2017 in Review

The goal for Dash in 2017 was to build out features that would make Dash a desirable place to publish data. While we continue to work with the research community to find incentives to publish data generally, the small team of us working on Dash wanted to take a moment to thank everyone who published data this year.

In 2017 we worked in two week sprint intervals to release 26 features and instances (not including fixes).

In 2018 we have one major focus: integrate into researcher workflows to make publishing data a more common practice.

To do so we will be working with the community to:

Follow along with our Github and Twitter and please get in touch with us if you have ideas or experiences to share for making data publishing a more common practice in the research environment.

Where’s the adoption? Shifting the Focus of Data Publishing in 2018

By Daniella Lowenberg

At RDA10 in Montreal I gave a presentation on Dash in the Repository Platforms for Research Data IG session. The session was focused on backend technology and technology communities for repository platforms. I talked a bit about the Dash open source software and features but walked away thinking “How productive is it to discuss software systems to support research data at length? Is adoption based on technology?”

The answers are: not productive, and no.

Following RDA10, I spent months talking with as many researchers and institutions as possible to figure out how much researchers know about data publishing and what would incentivize them to make it a common practice.

Researchers are the end users of research data publishing platforms and yet they are providing the least amount of input into these systems.

And if you think that is confusing, there is an additional layer of disorder: “researchers” is used as an umbrella term for various levels of scientists and humanists who can have drastically different opinions and values based on discipline and status.

I visited labs and took PIs, grad students, and postdocs to coffee at UCSF, UC Berkeley, and UC Santa Cruz. Coming from a science background and spending time convincing authors to make their data available at PLOS, I thought I had a pretty good sense of incentives, but I needed to span disciplines and leave the mindset of “you have to make your data available, or your paper will not be published” to hear researchers’ honest answers. Here’s what I found:

People like the idea of data publishing in theory, but in practice, motivation is lacking and excuses are prominent.

This is not surprising though. The following is an example scenario (with real quotes) of how data publishing is perceived at various statuses (for some control this scenario takes place within biomedical research)

Grad Student: “Data publishing sounds awesome, I would totally put my data out there when publishing my work but it’s really up to my PI and my PI doesn’t think it is necessary.”

Post Doc: “I like the idea but if we put data in here are people are going to use my data before I can publish 3 Nature papers as first author?”

PI: “I like the idea of having my students put their work in an archive so I can have all research outputs from the lab in one place, but until my Vice Chancellor of Research (VCR) tells me it is a priority I probably won’t use it.”

VCR: “Funder and Publisher mandates aren’t incentivizing enough?”

Publisher: “We really believe the funder mandates are the stick here.”

As you can tell there is not a consensus of understanding and there is a difference between theoretical and practical implementation of data publishing. As one postdoc said at UCSF “If I am putting on my academic hat, of course my motivation is the goodness of it. But, practically speaking I’m not motivated to do anything”. With differing perspectives for each stakeholder there are infinite ways to see how difficult it is to gauge interest in data publishing!

Other reasons adoption of data publishing practices is difficult:

At conferences and within the scholarly communication world, we speak in jargon about sticks (mandates) and carrots (reproducibility, transparency). We are talking to each other: people who have already bought into these incentives and needs and are living in an echo chamber. We forget that these mandates and reasons for open data are not well understood and effective by researchers themselves. Mandates and justifications about being “for the good of science” are not consistently understood across the lab. PIs are applying for grants and writing up Data Management Plans (DMPs), but the grad students and postdocs are doing the data analysis and submitting the paper. There is plenty of space here for miscommunication, misinformation, and difficulty. We also say that reproducibility, transparency, and getting credit for your work are wide ranging carrots, but reproducibility/transparency initiatives vary per field. Getting credit for publishing data is seemingly easy (like articles)- authorship on a dataset and citations of the DOI credit the researchers who first published the data. But, how can we say that right now researchers are “getting credit” for their data publications if citing data isn’t common practice, few publishers support data citations, and tenure committees aren’t looking at the reach of data?

We spend time talking to one another about how open data is a success because publishers have released X many data statements and repositories have X many datasets. Editors and reviewers typically do not check for (or want to check for) data associated with publications to ensure they are underlying or FAIR data, and many high volume repositories take any sort of work (conference talks, pdfs, posters). How many articles have the associated data publicly available and in a usable format? How many depositions to repositories are usable research data? We must take these metrics with a grain of salt and understand that while we are making progress, there are various avenues we must be investing in to make the open data movement a success.

All aspects of this are related to researcher education and lowering the activation energy (i.e. making it a common and accepted practice).

A provocative conversation to bridge people together:

In my presentation at CNI I scrolled through a number of quotes from researchers that I gathered during these coffee talks, and the audience laughed at many of them. The quotes are funny (or sad or realistic or [insert every range of emotion]), but even this reaction is reason for us to re-think our ways of driving adoption of research data management and open data practices. To be talking about technologies and features that aren’t requested by researchers is getting ahead of ourselves.

Right now there should be one focus: finding incentives and ways to integrate into workflows that effectively get researchers to open up and preserve their data.

When presenting this I was apprehensive but confident: I was presenting opinions and experiences but hearing someone say ‘we’re doing it wrong’ usually does not come with applause. What came of the presentation was a 30-minute discussion full of genuine experiences, honest opinions, and advice. Some discussion points that came up:

The general consensus was that we have to re-focus on researcher needs and integrate into researcher workflows. To do this successfully:

So, let’s work together. Let’s talk to as many researchers in as many domains and position levels in 2018. Let’s share these experiences out when we meet at conferences and on social media. And let’s focus on adoption of a practice (data publishing) instead of spotlighting technologies, to make open data a common, feasible, and incentivized success.