Back in December, I published a paper in the open access journal Ecosphere about data management education (available on the Ecosphere site). The manuscript is the result of my postdoctoral work with the DataONE organization, advised by Stephanie Hampton at NCEAS. When I started working with Steph, she posed an interesting question: Whatever happened to the lab notebook? Yes, people still take notes and keep notebooks, but the concept has not carried over in full. That is, data and information are increasingly born digital: how do we capture that in a pen-and-paper lab notebook?
While in grad school I printed out a lot of tables and graphs, followed cutting and pasting them into a lab notebook. I eventually figured out I needed to keep track of file names associated with the printouts. Of course, there are also the methods I used while creating data tables and other outputs of my analyses: I basically neglected this part altogether. The result was a patchy notebook that in no way allowed for reproducibility of my work. Sadly, I don’t think I’m alone. Although the tide may be turning towards better data management and documentation (thanks NSF to requiring data management plans!), we have a very long way to go.
So Steph and I asked the question: Are data management and organization practices being taught to students?
To answer this, we first had to decide what students we were asking about. We decided to focus on the students that are expected to understand the value of lab notebooks, diligent note-taking, and documentation of methods. Coverage of these topics might be a bit spotty at the high school level, but science classes in undergraduate institutions have always prioritized lab notebooks.
I set out to survey undergraduate institutions that are likely to teach future ecology graduate students. Why ecology? Partly because Steph an I are ecologists, who were based at the National Center for Ecological Analysis and Synthesis. Partly because DataONE focuses on Earth, environmental, ecological, atmospheric, and oceanographic data. But also, we needed to zero in on one group, so we chose ecologists.
I examined 38 large universities considered the best for graduate studies in ecology, plus 10 smaller liberal arts institutions whose outgoing ecology students receive the highest number of NSF Graduate Research Fellowships in ecology (for a full list of institutions, see Appendix A of the paper).
After selecting the institutions, I then surveyed the instructor for the institution’s ecology course. The survey (available in full as a PDF) asked about all things data management, including
- Quality control and quality assurance
- The proper way to name computer files
- Types of files and software to use
- Metadata generation Workflows
- Protecting data
- Databases and data archiving
- Data re-use
- Data sharing
- Notebook protocols (lab or field)
Next week I will go into a bit more detail about the results, but the gist is this: ecology undergraduates aren’t learning about data management. Although the professors find data management topics to be important for their own work, they are not inclined to find time in their curriculum to teach their students these topics. There are many reasons why this is the case; most notably time was mentioned, as well as the expectation that students would learn about these topics in other courses.
In case you can’t wait to find out what I found, here are links to the manuscript:
- Ecosphere site for paper: http://www.esajournals.org/doi/full/10.1890/ES12-00139.1
- PDF of paper: http://www.esajournals.org/doi/pdf/10.1890/ES12-00139.1
- Full survey PDF: http://dx.doi.org/10.1890/ES12-00139.2
- Raw survey results and data cleaning and processing scripts for R: Ecological Archives C003-013-S1