(index page)

The Digital Dark Age, Part 2

Earlier this week I blogged about the concept of a Digital Dark Age. This is a phrase that some folks are using to describe some future scenario where we are not able to read historical digital documents and multimedia because they have been rendered obsolete or were otherwise poorly archived. But what does this mean for scientific data?

Consider that Charles Darwin’s notebooks were recently scanned and made available online. This was possible because they were properly stored and archived, in a long-lasting format (in this case, on paper). Imagine if he had taken pictures of his finch beaks with a camera and saved the digital images in obsolete formats. Or ponder a scenario where he had used proprietary software to create his famous Tree of Life sketch. Would we be able to unlock those digital formats today? Probably not. We might have lost those important pieces of scientific history forever. Although it seems like software programs such as Microsoft Excel and MATLAB will be around forever, people probably said similar things about the programs Lotus 1-2-3 and iWeb.

darwin by diana sudyka — “Darwin with Finches” by Diana Sudyka, from Flickr by Karen E James

It is a common misconception that things that are posted on the internet will be around “forever”. While that might be true of embarrassing celebrity photos, it is much less likely to be true for things like scientific data. This is especially the case if data are kept on a personal/lab website or archived as supplemental material, rather than being archived in a public repository (See Santos, Blake and States 2005 for more information). Consider the fact that 10% of data published as supplemental material in the six top-cited journals was not available a mere five years later (Evangelou, Trikalinos, and Ioannidis, 2005).

Natalie Ceeney, chief executive of the National Archives, summed it up best in this quote from The Guardian’s 2007 piece on preventing a Digital Dark Age: “Digital information is inherently far more ephemeral than paper.”

My next post and final DDA installment will provide tips on how to avoid losing your data to the dark side.

Why You Should Floss

No, I won’t be discussing proper oral hygiene. What I mean by “flossing” is actually “backing up your data”. Why the floss analogy? Here are the similarities between flossing in backing up your data:

It’s undisputed that it’s important
Most people don’t do it as often as they should
You lie (to yourself, or your dentist) about how often you do it

dentist — Oral (and data) hygiene can be fun! From Calisphere, courtesy of UC Berkeley Bancroft Library

So think about backing up similarly to the way you think about flossing: you probably aren’t doing it enough. In this post, I will provide a general guidance about backing up your data; as always, the advice will vary greatly depending on the types of data you are generating, how often they change, and what computational resources are available to you.

First, create multiple copies in multiple locations. The old rule of thumb is original, near, far. The first copy is your working copy of data; the second copy is kept near your original (this is most likely an external hard drive or thumb drive); the third is kept far from your original (off site, such as at home or on a server outside of your office building). This is the important part: all three of these copies should be up-to-date. Which brings me to my second point.

Second, back up your data more often. I have had many conversations with scientists over the last few months, and I always ask, “How do you back up your data?” Answers range, but most of them scare me silly. For instance, there was a 5th year graduate student who had all of her data on a six-year-old laptop, and only backed up once a month. I get heart palpitations just typing that sentence. Other folks have said things like “I use my external drive to back things up once every couple of months”, or worst case scenario, “I know I should, but I just don’t back up”. It is strongly recommended that you back up every day. It’s a pain, right? There are two very easy ways to back up every day, and neither require any purchasing of hardware or software: (1) Keep a copy on Dropbox, or (2) Email yourself the data file as an attachment. Note: these suggestions are not likely to work for large data sets.

Third, find out what resources are available to you. Institutions are becoming aware of the importance of good backup and data storage systems, which means there might be ways for you to back up your data regularly with minimal effort. Check with your department or campus IT folks and ask about server space and automated backup service. If server space and/or backing up isn’t available, consider joining forces with other scientists to purchase servers for backing up (this is an option for professors more often than graduate students).

Finally, ensure that your backup plan is working. This is especially important if others are in charge of data backup. If your lab group has automated backup to a common computer, check to be sure your data are there, in full, and readable. Ensure that the backup is actually occurring as regularly as you think it is. More generally, you should be sure that if your laptop dies, or your office is flooded, or your home is burgled, you will be able to recover your data in full.

For more information on backing up, check out the DataONE education module “Protected back-ups”

Archiving Your Life: PDA 2012 Meeting

I’m currently sitting in a church. No, I’m not being disrespectful and blogging while at church. Technically, I’m in a former church, in the Richmond District of San Francisco. The Internet Archive bought an old church and turned it into an amazing space for their operation, as well as for meetings like the 2012 Personal Digital Archiving Meeting I’m currently attending.

I wasn’t sure what “personal digital archiving” meant, exactly, before I heard about this conference. It turns out the concept is very familiar to me. It’s basically thinking about how to preserve your life’s digital content – photos, emails, writings, files, scanned images, etc. etc. The concept of archiving personal materials is a very hot topic right now. Think about Facebook, Storify, iCloud, WordPress, and Flickr, to name a few. As a scientist, I actually think my of my data as personal digital files: they represent a very long period of my life, after all. So I’m at this meeting talking a bit about DCXL, and also learning a lot about some amazing new stuff. Here’s a few interesting tidbits:

Cowbird: This is a place to tell stories, rather than just archive their lives. According to the founder (who is attending this conference), Cowbird is about the experience of life, as opposed to merely curating life. For an amazing, moving example of how Cowbird works, check this out: First Love

The Brain: Very cool, free software that helps you organize links, definitions, notes, etc. The idea is that it works just like your brain: it makes connections and creates networks to provide meaning to each link. Play with it a bit and you will be hooked.

Pinboard: Technically, I already knew about Pinboard. But the founder of the bookmarking system gave a great talk, so I’m including it here. Pinboard has been described as how the bookmarking service Delicious used to work, before it stopped working well. For a very small fee (~$10) you can store your bookmarks, tag them, and even save copies of the web pages as they were when you viewed them- this comes in particularly handy if you use a website for research and it might mysteriously disappear without warning. My favorite thing about Pinboard is it isn’t mucked up with ads and other visual distractions.

Internet archive — The church meant for worship of all things digital: The Internet Archive. From Flickr by evan_carroll