Last week I had an interesting conversation with Hazel Asuncion from University of Washington’s Bothell campus. Along with an undergraduate computer science student, she’s been working on the development of an add-in for Excel that helps track data provenance.
“Hmm…”, you are thinking, “what is data provenance?” (at least, that’s what I asked when I first heard about it about 16 months ago). Data provenance is basically a description of where the data came from (here’s Wikipedia’s explanation). Provenance is an old concept in science- it’s directly related to reproducibility, which is considered among the most important tenets of scientific progress (more Wikipedia information here). Without good data provenance, i.e. a good record of how you got from measurements to reported results, others can’t evaluate and verify your conclusions independently.
With that background in mind, consider all of the complex steps that scientists take to get from the data they collect to the graphs for a journal publication. Most scientists would probably admit that this process is extremely difficult to recreate, even for themselves. The most common way to keep track of the various steps is using a lab notebook, or perhaps notes in a Word document, or maybe even in the same Excel file, in a different spreadsheet. Wouldn’t it be better if there were an easy way to keep track of your actions in Excel without having to manually take notes?
Here’s where Hazel’s work comes in: she has figured out a way to document activities in Excel that seamlessly integrates with the scientist’s current workflow. It’s a great example of a useful, interesting add-in that will prove invaluable to those that use it. Curious? Hazel will also be presenting the work at the eScience Conference in Stockholm this December, and a paper will be published in conjunction.
Here’s a demo of the add-in, available here under the agreement that it will be credited to Hazel’s group if used: