(index page)

Support your Data

Building an RDM Maturity Model: Part 4

Researchers are faced with rapidly evolving expectations about how they should manage and share their data, code, and other research products. These expectations come from a variety of sources, including funding agencies and academic publishers. As part of our effort to help researchers meet these expectations, the UC3 team spent much of last year investigating current practices. We studied how neuroimaging researchers handle their data, examined how researchers use, share, and value software, and conducted interviews and focus groups with researchers across the UC system. All of this has reaffirmed our that perception that researchers and other data stakeholders often think and talk about data in very different ways.

Such differences are central to another project, which we’ve referred to alternately as an RDM maturity model and an RDM guide for researchers. Since its inception, the goal of this project has been to give researchers tools to self assess their data-related practices and access the skills and experience of data service providers within their institutional libraries. Drawing upon tools with convergent aims, including maturity-based frameworks and visualizations like the research data lifecycle, we’ve worked to ensure that our tools are user friendly, free of jargon, and adaptable enough to meet the needs of a range of stakeholders, including different research, service provider, and institutional communities. To this end, we’ve renamed this project yet again to “Support your Data”.

Image showing some of the support structure for the Golden Gate Bridge. This image also nicely encapsulates how many of the practices described in our tools are essential to the research process but are often invisible from view.

What’s in a name?

Because our tools are intended to be accessible to a people with a broad range of perceptions, practices, and priorities, coming up with a name that encompasses complex concepts like “openness” and “reproducibility” proved to be quite difficult. We also wanted to capture the spirit of terms like “capability maturity” and “research data management (RDM)” without referencing them directly. After spending a lot of time trying to come up with something clever, we decided that the name of our tools should describe their function. Since the goal is to support researchers as they manage and share data (in ways potentially influenced by expectations related to openness and reproducibility), why not just use that?

Recent Developments

In addition to thinking through the name, we’ve also refined the content of our tools. The central element, a rubric that allows researchers to quickly benchmark their data-related practices, is shown below. As before, it highlights how the management of research data is an active and iterative process that occurs throughout the different phases of a project. Activities in different phases represented in different rows. Proceeding left to right, a series of declarative statements describe specific activities within each phase in order of how well they are designed to foster access to and use of data in the future.

The “Support your Data” rubric. Each row is complemented by a one page guide intended to help researchers advance their data-related practices.

The four levels “ad hoc”, “one-time”, “active and informative” and “optimized for re-use”, are intended to be descriptive rather than prescriptive.

Ad hoc — Refers to circumstances in which practices are neither standardized or documented. Every time a researcher has to manage their data they have to design new practices and procedures from scratch.
One time — Refers to circumstances in which data management occurs only when it is necessary, such as in direct response to a mandate from a funder or publisher. Practices or procedures implemented at one phase of a project are not designed with later phases in mind.
Active and informative — Refers to circumstances in which data management is a regular part of the research process. Practices and procedures are standardized, well documented, and well integrated with those implemented at other phases.
Optimized for re-use — Refers to circumstances in which data management activities are designed to facilitate the re-use of data in the future

Each row of the rubric is tied to a one page guide that provides specific information about how to advance practices as desired or required. Development of the content of the guides has proceeded sequentially. During the autumn and winter of 2017, members of the UC3 team met to discuss issues relevant to each phase, reduce the use of jargon, and identify how content could be localized to meet the needs of different research and institutional communities. We are currently working on revising the content based suggestions made during these meetings.

Next Steps

Now that we have scoped out the content, we’ve begun to focus on the design aspect of our tools. Working with CDL’s UX team, we’ve begun to think through the presentation of both the rubric and the guides in physical media and online.

As always, we welcome any and all feedback about content and application of our tools.

OA Week 2017: Maximizing the value of research

By John Borghi and Daniella Lowenberg

Happy Friday! This week we’ve defined open data, discussed some notable anecdotes, outlined publisher and funder requirements, and described how open data helps ensure reproducibility. To cap off open access week, let’s talk about one of the principal benefits of open data- it helps to maximize the value of research.

Research is expensive. There are different ways to break it down but, in the United States alone, billions of dollars are spent funding research and development every year. Much of this funding is distributed by federal agencies like the National Institutes of Health (NIH) and the National Science Foundation (NSF), meaning that taxpayer dollars are directly invested in the research process. The budgets of these agencies are under pressure from a variety of sources, meaning that there is increasing pressure on researchers to do more with less. Even if budgets weren’t stagnating, researchers would be obligated to ensure that taxpayer dollars aren’t wasted.

The economic return on investment for federally funded basic research may not be evident for decades and overemphasizing certain outcomes can lead to the issues discussed in yesterday’s post. But making data open doesn’t just refer to giving access other researchers, it also means giving taxpayers access to the research they paid for. Open data also enables reuse and recombination, meaning that a single financial investment can actually fund any number of projects and discoveries.

Research is time consuming. In addition to funding dollars, the cost of research can be measured in the hours it takes to collect, organize, analyse, document, and share data. “The time it takes” is one of the primary reasons cited when researchers are asked why they do not make their data open. However, while certainly takes time to ensure open data is organized and documented in such a way as to enable its use by others, making data open can actually save researchers time over the long run. For example, one consequence of the file drawer problem discussed yesterday is that researchers may inadvertently redo work already completed, but not published, by others. Making data open helps prevents this kind of duplication, which saves time and grant funding. However, the beneficiaries of open data aren’t just for other researchers- the organization and documentation involved in making data open can help researchers from having to redo their own work as well.

Research is expensive and time consuming for more than just researchers. One of the key principles for research involving human participants is beneficence– maximizing possible benefits while minimizing possible risks. Providing access to data by responsibly making it open increases the chances that researchers will be able to use it to make discoveries that result in significant benefits. Said another way, open data ensures that the time and effort graciously contributed by human research participants helps advance knowledge in as many ways as possible.

Making data open is not always easy. Organization and documentation take time. De-identifying sensitive data so that it can be made open responsibly can be less than straightforward. Understanding why doesn’t automatically translate into knowing how. But we hope this week we’ve given you some insight into the advantages of open data, both for individual researchers and for everyone that engages, publishes, pays for, and participates in the research process.

Welcome to OA Week 2017!

By John Borghi and Daniella Lowenberg

It’s Open Access week and that means it’s time to spotlight and explore Open Data as an essential component to liberating and advancing research.

Let’s Celebrate!

Who: Everyone. Everyone benefits from open research. Researchers opening up their data provides access to the people who paid for it (including taxpayers!), patients, policy makers, and other researchers who may build upon it and use it to expedite discoveries.

What: Making data open means making it available for others to use and examine as they see fit. Open data is about more than just making the data available on its own, it is also about opening up the tools, materials, and documentation that describes how the data were collected and analyzed and why decisions about the data were made.

When: Data can be made open anytime a paper is published, anytime null or negative results are found, anytime data are curated. All the open data, all the time.

Where: If you are a UC researcher, resources free to you are available at each of your campuses Research Data Management library websites. Dash is a data publication platform to make your data open and archived for participating UC campuses, UC Press, and DataONE’s ONEShare. For more open data resources, check out our upcoming post on Wednesday, October 25th.

Why: Data are what support conclusions, discoveries, cures, and policies. Opening up articles for free access to the world is very important, but the articles are only so valuable without the data that went into them.

Follow this week as we cover policies, user stories, resources, economics, and justifications for why researchers should all be making their (de-identified, IRB approved) data freely available.

Tweet to us @UC3CDL with any questions, comments, or contributions you may have.

Upcoming Posts

Tuesday, October 24th: Open Data in Order to… Stories & Testimonials

Wednesday, October 25th: Policies, Resources, & Guidance on How to Make Your Data Open

Thursday, October 26th: Open Data and Reproducibility

Friday, October 27th: Open Data and Maximizing the Value of Research

Great talks and fun at csv,conf,v3 and Carpentry Training

Day1 @CSVConference! This is the coolest conf I ever been to #csvconf pic.twitter.com/ao3poXMn81 — Yasmina Anwar (@yasmina_anwar) May 2, 2017 On May 2 – 5 2017, I (Yasmin AlNoamany) was thrilled to attend the csv,conf,v3 2017 conference and the Software/Data Carpentry instructor training in Portland, Oregon, USA. It was a unique experience to attend and speak with many … Continue reading →

Source: Great talks and fun at csv,conf,v3 and Carpentry Training

The Science of the DeepSea Challenge

Recently the film director and National Geographic explorer-in-residence James Cameron descended to the deepest spot on Earth: the Challenger Deep in the Mariana Trench. He partnered with lots of sponsors, including National Geographic and Rolex, to make this amazing trip happen. A lot of folks outside of the scientific community might not realize this, but until this week, there had been only one successful descent to this the trench by a human-occupied vehicle (that’s a submarine for you non-oceanographers). You can read more about that 1960 exploration here and here.

I could go on about how astounding it is that we know more about the moon than the bottom of the ocean, or discuss the seemingly intolerable physical conditions found at those depths– most prominently the extremely high pressure. However what I immediately thought when reading the first few articles about this expedition was where are the scientists?

h96804 — Before Cameron, Swiss Oceanographer Piccard and Navy officer Marsh went down in it to the virgin waters of the deep. From www.history.navy.mil/photos/sh-usn/usnsh-t/trste-b

After combing through many news stories, several National Geographic sites including the site for the expedition, and a few press releases, I discovered (to my relief) that there are plenty of scientists involved. The team that’s working with Cameron includes scientists from Scripps Institution of Oceanography (the primary scientific partner and long-time collaborator with Cameron), Jet Propulsion Lab, University of Hawaii, and University of Guam.

While I firmly believe that the success of this expedition will be a HUGE accomplishment for science in the United States, I wonder if we are sending the wrong message to aspiring scientists and youngsters in general. We are celebrating the celebrity film director involved in the project in lieu of the huge team of well-educated, interesting, and devoted scientists who are also responsible for this spectacular feat (I found less than 5 names of scientists in my internet hunt). Certainly Cameron deserves the bulk of the credit for enabling this descent, but I would like there to be a bit more emphasis on the scientists as well.

Better yet, how about emphasis on the science in general? It’s a too early for them to release any footage from the journey down, however I’m interested in how the samples will be/were collected, how they will be stored, what analyses will be done, whether there are experiments planned, and how the resulting scientific advances will be made just as public as Cameron’s trip was. The expedition site has plenty of information about the biology and geology of the trench, but it’s just background: there appears to be nothing about scientific methods or plans to ensure that this project will yield the maximum scientific advancement.

How does all of this relate to data and DCXL? I suppose this post falls in the category of data is important. The general public and many scientists hear the word “data” and glaze over. Data isn’t inherently interesting as a concept (except to a sick few of us). It needs just as much bolstering from big names and fancy websites as the deep sea does. After all, isn’t data exactly what this entire trip is about? Collecting data on the most remote corners of our planet? Making sure we document what we find so others can learn from it?

Here’s a roundup of some great reads about the Challenger expedition:

National Geographic: James Cameron Begins Descent to Ocean’s Deepest Point
National Geographic: Cameron’s dive cut short
National Geographic press release about Cameron’s trip to the bottom
National Geographic website for the project: Deepsea Challenge
The Guardian: James Cameron may kill the Kraken but not our journey of discovery
Spectacular post on Deep Sea News by Craig McCain about the value of this expedition for science and humanity
Scripps Institution of Oceanography information page about the Deep Sea Challenge
Stars and Stripes: Deep Sea Dive is Nothing New for the Navy
US Navy’s Press release for 1960 Trieste trip to the trench

Oceanographers: Why So Shy?

Last week I attended the TOS/ASLO/AGU Ocean Sciences 2012 Meeting in Salt Lake City. (If you are a DCXL blog regular, you know I was also at the Personal Digital Archiving 2012 Conference last week: my ears were bleeding by Friday night!). These two conferences were starkly different in many ways. Ocean Sciences had about 4,000 attendees, while PDA was closer to 100. Ocean Sciences had concurrent sessions, plenaries, and workshops, while PDA had only one room where all of the speakers presented. Although both provided provisions during breaks, PDA’s coffee and treats far surpassed those provided at the Salt Palace. But the most interesting difference? The incorporation of social media into the conference.

There are some amazing blogs out there for ocean scientists: Deep Sea News and SeaMonster come to mind immediately. There are also a plethora of active tweeters and bloggers in the ocean sciences community, including @labroides @jebyrnes (and his blog) @MiriamGoldste @RockyRohde @JohnFBruno @kzelnio @SFriedScientist @rejectedbanana @DrCraigMc @rmacpherson @Dr_Bik . I’m sure I’ve left some great ones out- feel free to tweet me and let me know! @carlystrasser).

That being said, ocean scientists stink at social media if OS 2012 was any indication.

First, the Ocean Sciences Meeting did not declare a hash tag – this is the first major conference I’ve been to in a while that didn’t do so. What does this mean? Those of us who were trying to communicate about OS 2012 via Twitter were not able to converge under a single hash tag until Tuesday (#oceans2012). Perhaps that isn’t such a big deal since there were only a dozen Tweeters at the conference. This is unusual for a conference of this size: at AGU 2011 in December, I would hazard to guess that there were more like 200 Tweeters. Food for thought.

Second, I heard from @MiriamGoldste that there was actual, audible clapping when disparaging comments were made about social media in one of the presentations. For shame, oceanographers! You should take advantage of tools offered to you; short of using social media yourself, you should recognize its growing importance in science (read some of the linked articles below).

Now for PDA 2012. A hash tag was declared (#pda12) and about 2 dozen active tweeters were off and running. We had dialogues during the conference, helped answer each others’ questions, commented on speakers’ major conclusions, and generally kept those that couldn’t attend the conference in person abreast of the goings-on. Combine that with real-time blogging of the meeting, and you had a recipe for being connected whether you were sitting in a pew at the Internet Archive or not. Links were tweeted to newly-posted slides, and generally there was a buzz about the conference.

So listen up, OS 2012 attendees: You are being left in the dust by other scientists who have embraced social media. I know what you are thinking: “I don’t have time to do all of that stuff!” One of the conference tweets says it best:

More information…

Read this great post from Scientific American on Social Media for Scientists

COMPASS: Communication partnership for science and the sea. I attended a COMPASS workshop two years ago at NCEAS and was swayed by the lovely Liz Neeley that social media was not only worth my time, but it could advance my career (read “Highly tweeted articles were 11x more likely to be cited” from The Atlantic).

Generally all of the resources on the Social Media For Scientists wikispace

Social Media for Scientists Recap from American Fisheries Society blog

As for how social media relates to the DCXL project, isn’t it obvious? I’ve been collecting feedback straight from potential DCXL users using social media. Because I have tapped into these networks, the DCXL project’s outcomes are likely to be useful for a large contingent of our target audience.

zach morris cell phone — It seems that oceanographers are stuck in the olden days of communication. For those keeping count, that’s TWO DCXL blog references to Zach Morris’ cell phone. From www.funnyordie.com

Academic Libraries: Under-Used & Under-Appreciated

I’m guilty. I often admit this when I meet librarians at conferences and workshops – I’m guilty of never using my librarians as a resource in my 13 years of higher ed, spread across seven academic institutions. At the very impressive MBL-WHOI Library in Woods Hole MA, there are quite a few friendly librarians that make their presence known to visitors. They certainly offered to help me, but it never occurred to me that they might be useful beyond telling me on what floor I can find the journal Limnology and Oceanography.

In hindsight, I didn’t know any better. Yes, we took the requisite library tour in grad school, and yes, I certainly used the libraries for research and access to books and journals, but no, I never talked to the librarians. Why is this? I have a few theories:

Librarians are terrible at self promotion. Every time I meet librarian, I’m awed and amazed by the vast quantities of knowledge they hold about all kinds of information. But most of the librarians I’ve encountered are unwilling to own up to their vast skill set. These humble folks assume scientists will come to them, completely underestimating the average academic’s stubbornness and propensity for self-sufficiency. In my opinion, librarians should stake out the popular coffee spot on campus and wear sandwich boards saying things like “You have no idea how to do research” or “Five minutes with me can change your <research> life“. Come on, librarians – toot your own horns!

Academics are trained to be self-sufficient. Every grad student has probably gotten the talk from their advisor at some point in their grad education. In my case the talk had phrases like these:

“You don’t have to ask me EVERY time you want to run down to the supply room”
“Which method do YOU think would work best?”
“How should I know how to dilute that acid? Go figure it out!”

It only takes a couple of brush-offs from your advisor before you realize that part of learning to be scientist involves solving problems all by yourself. This bodes well for future academic success, but does not allow us to entertain the idea that librarians might be helpful and save us oodles of time.

Google gives academics a false sense of security. Yes, I spend a lot of time Googling things. Many of this Googling occurs while having a drink with friends – some hotly debated item of trivia comes up, which requires that we pull out our smart phones to find out who’s right (it’s usually me). But Google can’t answer everything. Yes, it’s wonderful for figuring out who that actor in that movie was, or for showing a latecomer the amazing honey badger video. But Google is not necessarily the most efficient way to go about scholarly research. Librarians know this – they have entire schools dedicated to figuring out how to deal with information. The field of information science, which encompasses librarians, gives out graduate degrees in information. Do you really think that you know more about research than someone with a grad degree in information?? Extremely unlikely. Learn more about Information Science here.

Kingston Information and Library Service — Sterotype alert: there’s a lot of knowledge hiding behind librarians’ sensible shoes. From Flickr by Kingston Information & LIbrary Service

This post does, in fact, relate to the DCXL project. If you weren’t aware, the DCXL project is based out of California Digital Library. It turns out that librarians are quite good at being stewards of scholarly communication; who better to help us navigate the tricky world of digital data curation than librarians?

This post was inspired by a great blog posted yesterday from CogSci Librarian: How Librarians Can Help in Real Life, at #Sci013, and more