Authors: Vessela Ensberg (UC Davis), Jeanine Finn (Claremont Colleges), Greg Janée (UC Santa Barbara/California Digital Library), Amy Neeser (UC Berkeley), Scott Peterson (UC Berkeley)
The data curation unconference in 2018 took place prior to the UCDLFx meeting at UC Riverside. Nearly 30 attendees representing 11 institutions signed up to attend the event. In the course of three hours we completed two sessions and discussed seven problem statements. Topics ranged from building relationships to exchange of hands-on project experience. Below we highlight shared valuable experience and steps for moving forward in addressing these challenges.
How do you form long-term relationships with units outside the library? What do you propose when you initiate the conversation? What kind of involvement do you have in their work and vise-versa?
The key to establishing these relationships is to identify units with a similar research support mission. These relationships tend to be between the library and IT units, Grant Support or the Office of Research, educational technology centers, digital learning centers, campus learning groups, student makerspace groups or library student advisory groups. Some examples of collaborators are BIDS and DLab at UC Berkeley and CRESP at UC Riverside.
Meeting with these groups can identify any gaps in service. Collaboration between departments can also create a pool of consultants, wherein library staff will contribute their time and expertise to a growing pool of experts that can be used for consultations with various researchers. Another area for collaboration is co-teaching in different instruction areas. Finally, there are the opportunities for joint events. UC Berkeley holds consulting summits twice a year for all consultants from different departments to get together and talk in a semi-organized format that sometimes resembles an unconference, but can also be more focused with reports given from various working groups.. Examples of presented projects could be implementing docker in an instructional setting or how to acquire datasets and make them available.
Another approach to strengthening relationships with external units is to invest jointly in resources or positions. Due to the complexities associated with such collaborations, the parties may want to start with a proof of concept or a pilot project for a tool, or by hiring a limited-term position such as a CLIR Fellow.
In all approaches, it is important to communicate clearly the goals and how to assess the outcomes to everyone involved. In other words, the group agrees on what success looks like. A steering committee of stakeholders can be useful to identify the correct goals. These agreements should be documented, sometimes as a MOU. It is important to keep in mind that sustainability is key, and relationships require maintenance.
How do you work with non-traditional data for archiving? e.g., relational databases and digital humanities products?
There are three types of challenges in archiving non traditional products: communication, resources and need for workflows for novel digital research output. In additional to requiring the usual technological and human resources investment to process the materials, there are accommodations to be made for usability. For example, users expect to be able to stream video or to be able to search a database, while process of archiving the database requires flattening it. The challenge is only going to grow as more faculty are engaged in Digital Humanities, which poses the question of how their research output is going to be archived once they retire.
There are some solutions available to UC users. For instance, eScholarship now allows video streaming. Still, we need to meet the need of a having storage and access to the dissemination information package that is being actively used and retains many relevant functionalities in addition the archival information package that is rarely accessed. Ideally, we will see further integration of existing platforms (Dash and eScholarship) to enhance discovery. Perhaps a UC network can provide the infrastructure necessary to take the archival service a step further.
Have you assisted faculty by doing hands-on work with their data?
One paradigm for how libraries can interact with researchers in the area of data curation is the library providing consultation (only), e.g., giving advice on data management plans and repository selection. Another paradigm is the library acquiring (and assuming ownership of) the researcher’s data, and turning it into a library collection. Is there a middle ground? Are there paradigms in which the library plays a more active role in the handling and processing of researcher data?
The consensus that emerged out of a discussion of our collective experiences is that libraries generally do not perform hands-on work with researcher data. To the extent that librarians have worked with data, the focus has been strictly on pre-ingest, higher-level review. There are a variety of reasons for this, including limited resources and sustainability—no surprises there—but also the degree of faculty interest and the need for sufficient discipline knowledge. Some institutions (University of Michigan was noted) have policies that explicitly state (and limit) the degree of librarian involvement in faculty research.
It is in the area of metadata that libraries have played a much more hands-on role. There is ample precedent for librarians assisting with metadata preparation and review, particularly for the metadata backing dataset landing pages, since it is the metadata that directly supports library discovery services.
Ultimately, it was agreed that “success looks like researchers having the skills to curate their own data.”
How do we make publicly available data discoverable, and/or integrate into websites, help people find them?
There are two potential users who need to discover data: the casual user who is browsing and the researcher looking for deep datasets. Since users don’t think of the library catalog when they are looking for data, we need to utilize other means of communicating dataset locations. For example, research is often done on the causes and consequences of current events. With that need in mind UC Davis’ Michele Tobias communicated about generating a set of maps for the boundaries of American Viticulture Areas via a blog post. The blog was discovered by a journalist, who followed up on the dataset. Even though the dataset was not chosen to be featured in a later publication, it demonstrates how writing a story that connects events with datasets in order to support research assists their discovery. In a similar vein, clearly linking datasets with articles will assist their discovery for research.
It is also important to also keep in mind how our patrons search. Since many users start with Google, it is important that dataset metadata is discoverable that way, for example through schema.org for datasets or DataCite.
How do we better educate and advocate for data curation services with researchers?
The first step for successful advocacy is building relationships, following the discussion outlined in the beginning of this blog. In addition to the Office of Research and IT, Management Service Offices (MSO) were identified as important partners assisting with outreach. We should seek to make it easier for our partners to communicate with their network about our services, and provide them with a clear and eloquent message about what we offer.The message used for outreach needs to emphasize the free-of-charge services that sets libraries apart and frame the library as a resource for data services. The services are designed to make the researchers work easier.
Successfully assisting researchers provides word-of-the-mouth marketing that is very powerful. Examples of success stories of assisting researchers and raising the data services profile among them were the file transfer service at UCSD, providing training in data skills and using training to promote services and exploring the consequences of the claim that research data belongs to the UC Regents. Personal experience that results in a researcher vouching for the library results in a persuasive and impactful message.
What would a UC expertise consortia for RDM/curation look like?
We combined a long-term vision and a pragmatic approach in tackling this question. The Data Curation Network provided an inspiring example of how subject specific expertise can be exchanged across partnering institutions. After Lisa Johnson’s presentation for the CKG Deep Dive in September 2017, this collaborative model was probably on the minds of many participants. To move towards achieving this long-term vision, we proposed two actions to move forward: exchange of educational materials and catalog of expertise.
For the educational materials, including curriculum and training, we can use the CKG Google Drive folder. To use those effectively, we will also need a catalog and shared definitions. We can also share workflows in the same manner. Similarly, we wanted to have a catalog of skills relevant to working with research data and find out who in the Library possesses them. After some discussion we settled on undertaking this step within the members of the CKG. We will develop a survey instrument and apply for IRB approval to distribute it to CKG members and ask them to describe their skill set and the skillset of their team. Going forward, we envision presenting a larger project proposal to DOC.
How do we engage students and involve them?
There is no simple answer here. Participants discussed a variety of experiences that have met with some success in developing student engagement with data and data curation practices. The key theme was that it is necessary for library data services advocates to *both* welcome students into library spaces to take advantage of services, as well as go out to meet the students were they are. Students, graduates and undergraduates alike, are producing and publishing their research in greater numbers. We need to find ways to engage with them in the venues where they are already working and sharing their research products. For example, we discussed attending research poster presentations and asking authors about their funding plans as a way of introducing the library’s services for data management planning. Additionally, library programs that are not at first glance “data centric” can provide a gateway to a larger conversation about data management and preservations. Examples of these activities included workshops for establishing a scholarly identity/setting up an ORCID, developing OA educational materials, and learning how to use citation manager software. Connections with student organizations (like the graduate student organization) can also be beneficial in maintaining connections as students graduate.