At the University of California Curation Center (UC3), our commitment to advancing data curation and publishing is deeply rooted in our belief in open access and the open data movement. For years, we’ve worked to support researchers and ensure that UC scholarship resonates beyond academia. Our recent efforts, including our successful partnership with Dryad, are part of a broader strategy to amplify UC research and foster a more connected and open scientific landscape. As with all areas of our work, the world of research data is evolving rapidly, and the UC3 data curation team is embracing this change. Following the successful conclusion of the Dryad co-development work, we have now been exploring new projects that continue to support the research data community. This blog post outlines our direction and describes a few ways our team is leveraging past successes and continuing to evolve.
Overcoming Challenges, Exploring Opportunities
Publishing research data is complex, especially when ensuring it is Findable, Accessible, Interoperable, and Reusable (FAIR). However, advancements in artificial intelligence (AI) may offer exciting opportunities. Through our work with partners at the Generalist Repository Ecosystem Initiative (GREI), we have started to investigate AI tools that can help streamline curation to provide data creators with real-time feedback, targeted guidance, and even dynamic visualizations. This approach simplifies and enhances the publication process, making it more accessible and valuable to researchers.
Revolutionizing the Curation Process
High-quality, accessible research data is essential for progress in any field. Common dataset issues such as missing or inconsistent metadata, formatting errors, and lack of standardization can hinder progress. We are evaluating approaches to transform manual data curation processes to address these challenges directly. By doing so, we aim to unlock the full potential of datasets, enabling greater collaboration and reproducibility and accelerating progress across different fields.
Our two-part strategy focuses on:
- Pre-Deposit Support: Researchers receive interactive assistance in preparing their data for publication, ensuring it is ready for widespread dissemination and interoperable use.
- Post-Deposit Enhancement: This process involves reviewing and enhancing published datasets to improve their quality, usability, and potential for further research and applications.
Collaboration for a Brighter Future
Our team has many past collaborations within the repository ecosystem. Through these partnerships, we have been able to learn and strengthen the data publishing space. Our team’s continued participation in GREI expands our community of data repository infrastructure even further. This initiative brings together seven generalist repositories, including Zenodo, figshare, Dryad, Vivli, Mendeley Data, Center for Open Science, and Dataverse, allowing UC3 to leverage a wide range of tools and expertise for handling diverse datasets and complex curation tasks. By collaborating with the multiple repositories in GREI, we have opportunities to learn and work with varying approaches to managing and sharing data.
Data Packages: A Key to Unlocking Potential
One promising avenue our team has been evaluating is the use and utility of Data Packages, a concept pioneered by the Frictionless Data project at the Open Knowledge Foundation. Data Packages elevate the value of datasets by ensuring data and essential metadata are prepared in predictable structures, making them self-explanatory, easily shareable, and reusable. This enhances discoverability and usability for researchers. Data repositories can implement data packages by providing tools and guidance for consistent metadata creation. Researchers benefit from streamlined data submission processes automatically generating well-documented and accessible data packages. Implementing Data Packages is a key part of our broader strategy. Our initial API experiments across different repositories have shown promising results. While still in the early stages, we see significant potential to scale this work, transforming how data is published and curated.
Building a Sustainable Future Together
Our new direction is not just about technology; it’s about building a culture of openness and collaboration. By partnering with other organizations facing similar challenges, we can “future-proof” our efforts and ensure our solutions are sustainable and adaptable to the ever-changing landscape.
We are exploring innovative tools and workflows, including automated data quality assessment tools, AI-powered metadata enrichment tools, dynamic data visualization platforms, API integrations for seamless submissions, and data packaging tools. All these efforts aim to improve data curation and publishing. We are actively seeking comprehensive solutions to ensure scalability and optimize resource allocation for final implementation.
We are excited about creating a more connected and open scientific landscape where research data can achieve greater reach and impact. If you want to learn more, please contact the UC3 Data Publishing Product Manager, Steve Diggs, at steve.diggs@ucop.edu.