(index page)
UC3 New Year Series: Data Publishing at CDL in 2025
Structured, well-documented, and FAIR-aligned data is the foundation of effective research dissemination. However, data publishing activities have often focused on the last step in the research process. This puts energy on helping researchers clean up disorganized data sets and placing them in repositories. While this is essential to ensuring accessibility and preservation of important data outputs, it is also important to connect the dots and address the underlying issues that lead to poor data quality in the first place. Our previous development work and continuing membership with Dryad are great examples of this commitment to supporting well-formatted deposits. However, it has also always been the strategy of the UC3 data publishing team to invest in people through training, comprehensive documentation, institutional support and policies, and innovative tools. Our goal is to connect those dots and help empower the research community with the skills and knowledge to create high-quality, well-structured data from the outset.
In 2025, we aim to create a more open, transparent, and sustainable data-sharing future by combining emerging technologies with structured training programs. This dual approach improves data deposit quality and empowers researchers to contribute to a more efficient data-publishing ecosystem.
AI Tools for Data Publishing
Conversations with repository managers often highlight recurring challenges: incomplete documentation, missing README files, or data files that don’t match metadata standards. Automated “nudges” can catch such issues at the point of deposit. The vision is for AI-based systems to serve as virtual coaches that flag inconsistencies and as active collaborators capable of implementing necessary changes where appropriate. These tools will be able to modify metadata directly, generate appropriate README files, and restructure dataset instructions when needed—transforming how researchers prepare and deposit their data.
One promising component of our 2025 strategy involves the development of AI-assisted curation tools, which we’ve begun exploring to provide researchers real-time feedback on their data deposits. This approach leverages artificial intelligence to identify potential metadata, documentation, and formatting issues before submission. However, we won’t be covering this topic in detail here. Those interested in our AI curation initiatives, please refer to our previous article, in which we discussed this thoroughly.
In this post, we highlight CDL’s collaboration with The Carpentries as a key component of the UC3 data publishing strategy for 2025, emphasizing the human side of our approach: training. The Carpentries teaches foundational coding and data science skills to researchers worldwide, and through our partnership, we directly address the skills gap in metadata, documentation, and data formatting.
Training Translates to Broader Impact
Good data practices are part of many successful interdisciplinary collaborations. For example, after the Deepwater Horizon spill in the Gulf of Mexico, researchers in fields as varied as biology, oceanography, engineering, and socioeconomics exploited consistent metadata standards to share thousands of datasets seamlessly. That synergy is best achieved when data management principles are embedded long before a crisis or urgent need arises. Planting the seeds of data literacy in labs and classrooms allows institutions to sidestep the friction and duplicative efforts that often accompany cross-institutional projects.
Robust training programs also help teams stay nimble when policies shift. As mandates continue to change—whether through federal agencies or international collaborations—researchers grounded in best practices can adapt quickly, avoiding costly do-overs. In this sense, the cost-effectiveness of up-front training becomes an investment in a more flexible, forward-looking data ecosystem.
Why Training Makes Data Publishing Easier (and Less Costly)
Early exposure to best practices in data management often prevents unnecessary clean-up and steep learning curves later on in a researcher’s career. This observation was echoed in discussions at a recent Earth Science Information Partners (ESIP) meeting, where a central theme was the value of weaving data skills into formal coursework—rather than treating them as optional add-ons for already overworked researchers. Students who learn these concepts in undergraduate or graduate courses, sometimes through a single assignment requiring a formal data management plan, become more adept at producing coherent, reusable datasets.
In many cases, the hands-on philosophy developed by The Carpentries aligns with such classroom activities. Whether using version control for a small-scale project or learning to structure metadata for a mock submission to a repository, these experiences reduce the likelihood of encountering major data-quality issues down the line. Once researchers join labs and undertake funded projects, they have the required knowledge to meet evolving mandates without incurring frantic, last-minute adjustments.
The Carpentries and CDL: A Long-Standing Partnership
For over a decade, the CDL has worked with The Carpentries to refine curricula on coding, documentation, and data management best practices. A 2017 grant from the Institute of Museum and Library Services (IMLS) helped expand “Library Carpentry,” allowing librarians to participate in curation actively. Last year, we received another IMLS award to help the Carpentries scale their operations and curriculum. Over the years, UC3 staff have been closely involved in shaping these workshops, hosting sessions, and serving on governance councils to promote a broader culture of responsible data stewardship.
One of the main strengths of The Carpentries’ model is its train-the-trainer approach. Seeding new workshops across campuses and disciplines is possible by certifying volunteer instructors within organizations. This approach has found synergy with our participation in the Generalist Repository Ecosystem Initiative (GREI), a collaborative effort bringing together seven major generalist repositories, including Zenodo, Dryad, Vivli, Center for Open Science, and Dataverse. Through GREI, we’re expanding the reach and impact of data publishing best practices across diverse repository infrastructures.
Under the auspices of the GREI project, we’re working with selected Carpentries modules to address specific data publishing challenges across multiple repository environments. In 2025, we’ll pilot these modified modules in workshops to gain practical teaching experience with this GREI-relevant curriculum. This field testing will provide valuable instructor and participant feedback, allowing us to refine the content and delivery methods. This iterative approach ensures that these modules will ultimately integrate seamlessly into the broader Carpentries curriculum, creating sustainable resources that address the complexities of modern data publishing.
Moving Forward in 2025
High-quality data deposits rarely emerge by accident, they require intentional investment in training, documentation, institutional support, and tools. At UC3, we take a holistic approach, recognizing that creating better datasets goes beyond technical solutions – it demands strategic investments across the entire research data lifecycle.
By strengthening training programs, refining repository workflows, and making learning resources widely accessible, we help researchers at all levels produce well-structured, reusable data. Our ongoing collaborations with The Carpentries and GREI ensure that best practices continue to evolve alongside the research community’s needs. With these efforts, “deposit-ready data” can become the standard rather than the exception, reducing inefficiencies and accelerating scientific discovery. As we move through 2025 and beyond, our focus remains clear: building a sustainable, scalable, and human-centered data publishing ecosystem that empowers researchers and institutions alike
Embracing a New Era of Data Curation: A Vision for Openness and Innovation at UC3
At the University of California Curation Center (UC3), our commitment to advancing data curation and publishing is deeply rooted in our belief in open access and the open data movement. For years, we’ve worked to support researchers and ensure that UC scholarship resonates beyond academia. Our recent efforts, including our successful partnership with Dryad, are part of a broader strategy to amplify UC research and foster a more connected and open scientific landscape. As with all areas of our work, the world of research data is evolving rapidly, and the UC3 data curation team is embracing this change. Following the successful conclusion of the Dryad co-development work, we have now been exploring new projects that continue to support the research data community. This blog post outlines our direction and describes a few ways our team is leveraging past successes and continuing to evolve.
Overcoming Challenges, Exploring Opportunities
Publishing research data is complex, especially when ensuring it is Findable, Accessible, Interoperable, and Reusable (FAIR). However, advancements in artificial intelligence (AI) may offer exciting opportunities. Through our work with partners at the Generalist Repository Ecosystem Initiative (GREI), we have started to investigate AI tools that can help streamline curation to provide data creators with real-time feedback, targeted guidance, and even dynamic visualizations. This approach simplifies and enhances the publication process, making it more accessible and valuable to researchers.
Revolutionizing the Curation Process
High-quality, accessible research data is essential for progress in any field. Common dataset issues such as missing or inconsistent metadata, formatting errors, and lack of standardization can hinder progress. We are evaluating approaches to transform manual data curation processes to address these challenges directly. By doing so, we aim to unlock the full potential of datasets, enabling greater collaboration and reproducibility and accelerating progress across different fields.
Our two-part strategy focuses on:
- Pre-Deposit Support: Researchers receive interactive assistance in preparing their data for publication, ensuring it is ready for widespread dissemination and interoperable use.
- Post-Deposit Enhancement: This process involves reviewing and enhancing published datasets to improve their quality, usability, and potential for further research and applications.
Collaboration for a Brighter Future
Our team has many past collaborations within the repository ecosystem. Through these partnerships, we have been able to learn and strengthen the data publishing space. Our team’s continued participation in GREI expands our community of data repository infrastructure even further. This initiative brings together seven generalist repositories, including Zenodo, figshare, Dryad, Vivli, Mendeley Data, Center for Open Science, and Dataverse, allowing UC3 to leverage a wide range of tools and expertise for handling diverse datasets and complex curation tasks. By collaborating with the multiple repositories in GREI, we have opportunities to learn and work with varying approaches to managing and sharing data.
Data Packages: A Key to Unlocking Potential
One promising avenue our team has been evaluating is the use and utility of Data Packages, a concept pioneered by the Frictionless Data project at the Open Knowledge Foundation. Data Packages elevate the value of datasets by ensuring data and essential metadata are prepared in predictable structures, making them self-explanatory, easily shareable, and reusable. This enhances discoverability and usability for researchers. Data repositories can implement data packages by providing tools and guidance for consistent metadata creation. Researchers benefit from streamlined data submission processes automatically generating well-documented and accessible data packages. Implementing Data Packages is a key part of our broader strategy. Our initial API experiments across different repositories have shown promising results. While still in the early stages, we see significant potential to scale this work, transforming how data is published and curated.
Building a Sustainable Future Together
Our new direction is not just about technology; it’s about building a culture of openness and collaboration. By partnering with other organizations facing similar challenges, we can “future-proof” our efforts and ensure our solutions are sustainable and adaptable to the ever-changing landscape.
We are exploring innovative tools and workflows, including automated data quality assessment tools, AI-powered metadata enrichment tools, dynamic data visualization platforms, API integrations for seamless submissions, and data packaging tools. All these efforts aim to improve data curation and publishing. We are actively seeking comprehensive solutions to ensure scalability and optimize resource allocation for final implementation.
We are excited about creating a more connected and open scientific landscape where research data can achieve greater reach and impact. If you want to learn more, please contact the UC3 Data Publishing Product Manager, Steve Diggs, at steve.diggs@ucop.edu.
A new opportunity to build a better (data) future
Last month I left my comfort zone.
After 30 years of working as an engineer, developer, and technical leader at Scripps Institution of Oceanography (SIO at UC San Diego), I started a new career as a Senior Product Manager and Research Data Specialist with UC Curation Center (UC3) at the California Digital Library. While it may sound like a big change, it was more of steady evolution.
Although my projects at SIO were initially focused on scientific instrumentation, software development, and engineering specifications, I found the curation of the in situ data to be fascinating and better aligned with my skills and preferences. This led to service opportunities which included leadership positions within national and international data initiatives, and those projects allowed me to collaborate with members of UC3.
Joining their team was the next logical step.
The transition from being part of the technical staff in a research setting to being a hands-on data advocate in UC3 has been an invigorating challenge so far, and it provides an excellent opportunity to build on my foundation of knowledge and grow in new areas.
It’s an honor to pick up where my predecessor, Daniella Lowenberg, left off. I’ve long admired her approach to all things data. I am grateful for the extraordinary measures that she and John Chodacki have taken to bring me up to speed as soon as possible.
Data publishing is a dynamic young field and my colleagues and I will be able to help shape the conversations, initiatives, and tools that serve the international research community. I look forward to working with my new colleagues as we advocate for open data and help build and implement infrastructure to make data more discoverable, interoperable, and reusable.