(index page)

The Science of the DeepSea Challenge

Recently the film director and National Geographic explorer-in-residence James Cameron descended to the deepest spot on Earth: the Challenger Deep in the Mariana Trench. He partnered with lots of sponsors, including National Geographic and Rolex, to make this amazing trip happen. A lot of folks outside of the scientific community might not realize this, but until this week, there had been only one successful descent to this the trench by a human-occupied vehicle (that’s a submarine for you non-oceanographers). You can read more about that 1960 exploration here and here.

I could go on about how astounding it is that we know more about the moon than the bottom of the ocean, or discuss the seemingly intolerable physical conditions found at those depths– most prominently the extremely high pressure. However what I immediately thought when reading the first few articles about this expedition was where are the scientists?

h96804 — Before Cameron, Swiss Oceanographer Piccard and Navy officer Marsh went down in it to the virgin waters of the deep. From www.history.navy.mil/photos/sh-usn/usnsh-t/trste-b

After combing through many news stories, several National Geographic sites including the site for the expedition, and a few press releases, I discovered (to my relief) that there are plenty of scientists involved. The team that’s working with Cameron includes scientists from Scripps Institution of Oceanography (the primary scientific partner and long-time collaborator with Cameron), Jet Propulsion Lab, University of Hawaii, and University of Guam.

While I firmly believe that the success of this expedition will be a HUGE accomplishment for science in the United States, I wonder if we are sending the wrong message to aspiring scientists and youngsters in general. We are celebrating the celebrity film director involved in the project in lieu of the huge team of well-educated, interesting, and devoted scientists who are also responsible for this spectacular feat (I found less than 5 names of scientists in my internet hunt). Certainly Cameron deserves the bulk of the credit for enabling this descent, but I would like there to be a bit more emphasis on the scientists as well.

Better yet, how about emphasis on the science in general? It’s a too early for them to release any footage from the journey down, however I’m interested in how the samples will be/were collected, how they will be stored, what analyses will be done, whether there are experiments planned, and how the resulting scientific advances will be made just as public as Cameron’s trip was. The expedition site has plenty of information about the biology and geology of the trench, but it’s just background: there appears to be nothing about scientific methods or plans to ensure that this project will yield the maximum scientific advancement.

How does all of this relate to data and DCXL? I suppose this post falls in the category of data is important. The general public and many scientists hear the word “data” and glaze over. Data isn’t inherently interesting as a concept (except to a sick few of us). It needs just as much bolstering from big names and fancy websites as the deep sea does. After all, isn’t data exactly what this entire trip is about? Collecting data on the most remote corners of our planet? Making sure we document what we find so others can learn from it?

Here’s a roundup of some great reads about the Challenger expedition:

National Geographic: James Cameron Begins Descent to Ocean’s Deepest Point
National Geographic: Cameron’s dive cut short
National Geographic press release about Cameron’s trip to the bottom
National Geographic website for the project: Deepsea Challenge
The Guardian: James Cameron may kill the Kraken but not our journey of discovery
Spectacular post on Deep Sea News by Craig McCain about the value of this expedition for science and humanity
Scripps Institution of Oceanography information page about the Deep Sea Challenge
Stars and Stripes: Deep Sea Dive is Nothing New for the Navy
US Navy’s Press release for 1960 Trieste trip to the trench

Hooray for Progress!

Great news on the DCXL front! We are moving forward with the Excel add-in and will have something to share with the community this summer. If you missed it, back in January the DCXL project had an existential crisis: add-in or web-based application? I posted on the subject here and here. We spent a lot of time talking to the community and collating feedback, weighing the pros and cons of each option, and carefully considering how best to proceed with the DCXL project.

And the conclusion we came to… let’s develop both!

Comparing web-based applications and add-ins (aka plug-ins) is really an apples-and-oranges comparison. How could we discount that a web-based application is yet another piece of software for scientists to learn? Or that an add-in is only useful for Excel spreadsheets running a Windows operating system? Instead, we have chosen to first create an add-in (this was the original intent of the project), then move that functionality to a web-based application that will have more flexibility for the longer term.

Albert-Camus — What do Camus, The Cure, and DCXL have in common? Existentialists at heart. From www.openlettersmonthly.com

The capabilities of the add-in and the web-based application will be similar: we are still aiming to create metadata, check the data file for .csv compatibility, generate a citation, and upload the data set to a data repository. For a full read of the requirements (updated last week), check out the Requirements page on this site. The implementation of these requirements might be slightly different, but the goals of the DCXL project will be met in both cases: we will facilitate good data management, data archiving, and data sharing.

It’s true that the DCXL project is running a bit behind schedule, but we believe that it will be possible to create the two prototypes before the end of the summer. Check back here for updates on our progress.

The Digital Dark Age, Part 2

Earlier this week I blogged about the concept of a Digital Dark Age. This is a phrase that some folks are using to describe some future scenario where we are not able to read historical digital documents and multimedia because they have been rendered obsolete or were otherwise poorly archived. But what does this mean for scientific data?

Consider that Charles Darwin’s notebooks were recently scanned and made available online. This was possible because they were properly stored and archived, in a long-lasting format (in this case, on paper). Imagine if he had taken pictures of his finch beaks with a camera and saved the digital images in obsolete formats. Or ponder a scenario where he had used proprietary software to create his famous Tree of Life sketch. Would we be able to unlock those digital formats today? Probably not. We might have lost those important pieces of scientific history forever. Although it seems like software programs such as Microsoft Excel and MATLAB will be around forever, people probably said similar things about the programs Lotus 1-2-3 and iWeb.

darwin by diana sudyka — “Darwin with Finches” by Diana Sudyka, from Flickr by Karen E James

It is a common misconception that things that are posted on the internet will be around “forever”. While that might be true of embarrassing celebrity photos, it is much less likely to be true for things like scientific data. This is especially the case if data are kept on a personal/lab website or archived as supplemental material, rather than being archived in a public repository (See Santos, Blake and States 2005 for more information). Consider the fact that 10% of data published as supplemental material in the six top-cited journals was not available a mere five years later (Evangelou, Trikalinos, and Ioannidis, 2005).

Natalie Ceeney, chief executive of the National Archives, summed it up best in this quote from The Guardian’s 2007 piece on preventing a Digital Dark Age: “Digital information is inherently far more ephemeral than paper.”

My next post and final DDA installment will provide tips on how to avoid losing your data to the dark side.

The Digital Dark Age, Part 1

This will be known as the Digital Dark Age. The first time I heard this statement was at Internet Archive, during the PDA 2012 Meeting (read my blog post about it here). What did this mean? What is a Digital Dark Age? Read on.

While serving in Vietnam, my father wrote letters to my grandparents about his life fighting a war in a foreign country. One of his letters was sent to arrive in time for my grandfather’s birthday, and it contained a lovely poem that articulated my father’s warm feelings about his childhood, his parents, and his upbringing. My grandparents kept the poem framed in a prominent spot in their home. When I visited them as a child, I would read the poem written in my young dad’s handwriting, stare at the yellowed paper, and think about how far that poem had to travel to relay its greetings to my grandparents. It was special– for its history, the people involved, and the fact that these people were intimately connected to me.

Now fast forward to 2012. Imagine modern-day soldiers all over the world, emailing, making satellite phone calls, and chatting with their families via video conferencing. When compared to snail mail, these modern communication methods are likely a much preferred way of staying in touch for those families. But how likely is it that future grandchildren will be able to listen those the conversations, read those emails, or watch those video calls? The answer is extremely unlikely.

These two scenarios sum up the concept of a Digital Dark Age: compared to 40 years ago, we are doing a terrible job of ensuring that future generations will be able to read our letters, look at our pictures, or use our scientific data.

You mean future generations won’t be able to listen to my mix tapes?! From Flickr by newrambler

The Digital Dark Age “refers to a possible future situation where it will be difficult or impossible to read historical digital documents and multimedia, because they have been stored in an obsolete and obscure digital format.” The phrase “Dark Age” is a reference to The Dark Ages, a period in history around the beginning of the Middle Ages characterized by a scarcity of historical and other written records at least for some areas of Europe, rendering it obscure to historians. Sounds scary, no?

How can we remedy this situation? What are people doing about it? Most importantly, what does this mean for scientific advancement? Check out my next post to find out.

Fun Uses for Excel

Friday movie — “Excel can do WHAT?” Image from Friday (the movie), from newsone.com

It’s Friday! Better still, it’s Friday afternoon! To honor all of the hard work we’ve done this week, let’s have some fun with Excel. Check out these interesting uses for Excel that have nothing to do with your data:

Want to see some silly spreadsheet movies? Here ya go.

Excel Hero: Download .xls files that create nifty optical illusions. Here’s one of them.

From PCWorld, Fun uses for Excel, including a Web radio player that plays inside your worksheet (click to download the zip file and then select a station), or simulating dice rolls in case of a lack-of-dice emergency during Yatzee.

Here’s the results of a Google Image Search for “Excel art“:

Mona Lisa never looked so smart. Want to know more? Check out the YouTube video tutorial or read Creating art with Microsoft Excel from the blog digital inspiration.

Why You Should Floss

No, I won’t be discussing proper oral hygiene. What I mean by “flossing” is actually “backing up your data”. Why the floss analogy? Here are the similarities between flossing in backing up your data:

It’s undisputed that it’s important
Most people don’t do it as often as they should
You lie (to yourself, or your dentist) about how often you do it

dentist — Oral (and data) hygiene can be fun! From Calisphere, courtesy of UC Berkeley Bancroft Library

So think about backing up similarly to the way you think about flossing: you probably aren’t doing it enough. In this post, I will provide a general guidance about backing up your data; as always, the advice will vary greatly depending on the types of data you are generating, how often they change, and what computational resources are available to you.

First, create multiple copies in multiple locations. The old rule of thumb is original, near, far. The first copy is your working copy of data; the second copy is kept near your original (this is most likely an external hard drive or thumb drive); the third is kept far from your original (off site, such as at home or on a server outside of your office building). This is the important part: all three of these copies should be up-to-date. Which brings me to my second point.

Second, back up your data more often. I have had many conversations with scientists over the last few months, and I always ask, “How do you back up your data?” Answers range, but most of them scare me silly. For instance, there was a 5th year graduate student who had all of her data on a six-year-old laptop, and only backed up once a month. I get heart palpitations just typing that sentence. Other folks have said things like “I use my external drive to back things up once every couple of months”, or worst case scenario, “I know I should, but I just don’t back up”. It is strongly recommended that you back up every day. It’s a pain, right? There are two very easy ways to back up every day, and neither require any purchasing of hardware or software: (1) Keep a copy on Dropbox, or (2) Email yourself the data file as an attachment. Note: these suggestions are not likely to work for large data sets.

Third, find out what resources are available to you. Institutions are becoming aware of the importance of good backup and data storage systems, which means there might be ways for you to back up your data regularly with minimal effort. Check with your department or campus IT folks and ask about server space and automated backup service. If server space and/or backing up isn’t available, consider joining forces with other scientists to purchase servers for backing up (this is an option for professors more often than graduate students).

Finally, ensure that your backup plan is working. This is especially important if others are in charge of data backup. If your lab group has automated backup to a common computer, check to be sure your data are there, in full, and readable. Ensure that the backup is actually occurring as regularly as you think it is. More generally, you should be sure that if your laptop dies, or your office is flooded, or your home is burgled, you will be able to recover your data in full.

For more information on backing up, check out the DataONE education module “Protected back-ups”

Tweeting for Science

At risk of veering off course of this blog’s typical topics, I am going to post about tweeting. This topic is timely given my previous post about the lack of social media use in Ocean Sciences, the blog post that it spawned at Words in mOcean, and the Twitter hash tag #NewMarineTweep. A grad school friend asked me recently what I like about tweeting (ironically, this was asked using Facebook). So instead of touting my thoughts on Twitter to my limited Facebook friends, I thought I would post here and face the consequences of avoiding DCXL almost completely this week on the blog.

First, there’s no need to reinvent the wheel. Check out these resources about tweeting in science:

Wired did a great piece on Twitter + Science, including a list of tweets collected by the piece’s author about why scientists choose to tweet. Don’t take my word for it- read up on what the masses said about Twitter.
The social media expert + scientist Christie Wilcox (aka @NerdyChristie) created a super duper set of slides about “Why every lab should tweet”; it’s a visual, easy-to-follow way to understand how Twitter could shape your science for the better.
Of course, the amazing Marine Science blog Deep Sea News posted about Twitter’s power way back in 2010. Read up on what they say about it.
The blog Biodiversity in Focus (written by a grad student) recently posted about science and Twitter use, and sums up why it’s valuable in a single word: Networking.
If you are more geoscience-inclined, check out AGU’s piece on Twitter in science.

That being said, I will now pontificate on the value of Twitter for science, in handy numbered list form.

It saves me time. This might seem counter-intuitive, but it’s absolutely true. If you are a head-in-the-sand kind of person, this point might not be for you. But I like to know what’s going on in science, science news, the world of science publishing, science funding, etc. etc. That doesn’t even include regular news or local events. The point here is that instead of checking websites, digging through RSS feeds, or having an overfull email inbox, I have filtered all of these things through HootSuite. HootSuite is one of several free services for organizing your Twitter feeds; mine looks like a bunch of columns arranged by topic. That way I can quickly and easily check on the latest info, in a single location. Here’s a screenshot of my HootSuite page, to give you an idea of the possibilities: click to open the PDF: HootSuite_Screenshot
It is great for networking. I’ve met quite a few folks via Twitter that I probably never would have encountered otherwise. Some have become important colleagues, others have become friends, and all of them have helped me find resources, information, and insight. I’ve been given academic opportunities based on these relationships and connections. How does this happen? The Twittersphere is intimate and small enough that you can have meaningful interactions with folks. Plus, there’s tweetups, where Twitter folks meet up at a designated physical location for in-person interaction and networking.
It’s the best way to experience a conference, whether or not you are physically there. This is what spawned that previous post about Oceanography and the lack of social media use. I was excited to experience my first Ocean Sciences meeting with all of the benefits of Twitter, only to be disappointed at the lack of participation. In a few words, here’s how conference (or any event) tweeting works:
1. A hash tag is declared. It’s something short and pithy, like #Oceans2012. How do you find out about the tag? Usually the organizing committee tells you, or in lieu of that you rely on your Twitter network to let you know.
2. Everyone who tweets about a conference, interaction, talk, etc. uses the hash tag in their tweet. Examples:
3. Hash tags are ephemeral, but they allow you to see exactly who’s talking about something, whether you follow them or not. They are a great way to find people on Twitter that you might want to network with… I’m looking at you, @rejectedbanana @miriamGoldste.
4. If you are not able to attend a conference, you can “follow along” on your computer and get real-time feeds of what’s happening. I’ve followed several conferences like this- over the course of the day, I will check in on the feed a few times and see what’s happening. It’s the next best thing to being there.

I could continue expounding the greatness of Twitter, but as I said before, others have done a better job than I could (see links above). No, it’s not for everyone. But keep in mind that you can follow people, hash tags, etc. without actually ever tweeting. You can reap the benefits of everything I mentioned above, except for the networking. Food for thought.

My friend from WHOI, who also attended the Ocean Sciences meeting, emailed me this comment later:

…I must say those “#tweetstars” were pretty smug about their tweeting, like they were the sitting at the cool kids table during lunch or something…

I countered that it was more like those tweeting at OS were incredulous at the lack of tweets, but yes, we are definitely the cool kids.

Help us get started

We at University of California Curation Center (UC3) are very interested in engaging in the conversations happening about data. The purpose of this blog is to explore the landscape of digital data. We are interested in topics such as

data publication
data sharing
data archiving
data citation
open data
open science

Do you have topics you would like us to discuss on this blog? Please comment on this post or email me. The conversation is only beginning, and is sure to be interesting.

Oceanographers: Why So Shy?

Last week I attended the TOS/ASLO/AGU Ocean Sciences 2012 Meeting in Salt Lake City. (If you are a DCXL blog regular, you know I was also at the Personal Digital Archiving 2012 Conference last week: my ears were bleeding by Friday night!). These two conferences were starkly different in many ways. Ocean Sciences had about 4,000 attendees, while PDA was closer to 100. Ocean Sciences had concurrent sessions, plenaries, and workshops, while PDA had only one room where all of the speakers presented. Although both provided provisions during breaks, PDA’s coffee and treats far surpassed those provided at the Salt Palace. But the most interesting difference? The incorporation of social media into the conference.

There are some amazing blogs out there for ocean scientists: Deep Sea News and SeaMonster come to mind immediately. There are also a plethora of active tweeters and bloggers in the ocean sciences community, including @labroides @jebyrnes (and his blog) @MiriamGoldste @RockyRohde @JohnFBruno @kzelnio @SFriedScientist @rejectedbanana @DrCraigMc @rmacpherson @Dr_Bik . I’m sure I’ve left some great ones out- feel free to tweet me and let me know! @carlystrasser).

That being said, ocean scientists stink at social media if OS 2012 was any indication.

First, the Ocean Sciences Meeting did not declare a hash tag – this is the first major conference I’ve been to in a while that didn’t do so. What does this mean? Those of us who were trying to communicate about OS 2012 via Twitter were not able to converge under a single hash tag until Tuesday (#oceans2012). Perhaps that isn’t such a big deal since there were only a dozen Tweeters at the conference. This is unusual for a conference of this size: at AGU 2011 in December, I would hazard to guess that there were more like 200 Tweeters. Food for thought.

Second, I heard from @MiriamGoldste that there was actual, audible clapping when disparaging comments were made about social media in one of the presentations. For shame, oceanographers! You should take advantage of tools offered to you; short of using social media yourself, you should recognize its growing importance in science (read some of the linked articles below).

Now for PDA 2012. A hash tag was declared (#pda12) and about 2 dozen active tweeters were off and running. We had dialogues during the conference, helped answer each others’ questions, commented on speakers’ major conclusions, and generally kept those that couldn’t attend the conference in person abreast of the goings-on. Combine that with real-time blogging of the meeting, and you had a recipe for being connected whether you were sitting in a pew at the Internet Archive or not. Links were tweeted to newly-posted slides, and generally there was a buzz about the conference.

So listen up, OS 2012 attendees: You are being left in the dust by other scientists who have embraced social media. I know what you are thinking: “I don’t have time to do all of that stuff!” One of the conference tweets says it best:

More information…

Read this great post from Scientific American on Social Media for Scientists

COMPASS: Communication partnership for science and the sea. I attended a COMPASS workshop two years ago at NCEAS and was swayed by the lovely Liz Neeley that social media was not only worth my time, but it could advance my career (read “Highly tweeted articles were 11x more likely to be cited” from The Atlantic).

Generally all of the resources on the Social Media For Scientists wikispace

Social Media for Scientists Recap from American Fisheries Society blog

As for how social media relates to the DCXL project, isn’t it obvious? I’ve been collecting feedback straight from potential DCXL users using social media. Because I have tapped into these networks, the DCXL project’s outcomes are likely to be useful for a large contingent of our target audience.

zach morris cell phone — It seems that oceanographers are stuck in the olden days of communication. For those keeping count, that’s TWO DCXL blog references to Zach Morris’ cell phone. From www.funnyordie.com

Archiving Your Life: PDA 2012 Meeting

I’m currently sitting in a church. No, I’m not being disrespectful and blogging while at church. Technically, I’m in a former church, in the Richmond District of San Francisco. The Internet Archive bought an old church and turned it into an amazing space for their operation, as well as for meetings like the 2012 Personal Digital Archiving Meeting I’m currently attending.

I wasn’t sure what “personal digital archiving” meant, exactly, before I heard about this conference. It turns out the concept is very familiar to me. It’s basically thinking about how to preserve your life’s digital content – photos, emails, writings, files, scanned images, etc. etc. The concept of archiving personal materials is a very hot topic right now. Think about Facebook, Storify, iCloud, WordPress, and Flickr, to name a few. As a scientist, I actually think my of my data as personal digital files: they represent a very long period of my life, after all. So I’m at this meeting talking a bit about DCXL, and also learning a lot about some amazing new stuff. Here’s a few interesting tidbits:

Cowbird: This is a place to tell stories, rather than just archive their lives. According to the founder (who is attending this conference), Cowbird is about the experience of life, as opposed to merely curating life. For an amazing, moving example of how Cowbird works, check this out: First Love

The Brain: Very cool, free software that helps you organize links, definitions, notes, etc. The idea is that it works just like your brain: it makes connections and creates networks to provide meaning to each link. Play with it a bit and you will be hooked.

Pinboard: Technically, I already knew about Pinboard. But the founder of the bookmarking system gave a great talk, so I’m including it here. Pinboard has been described as how the bookmarking service Delicious used to work, before it stopped working well. For a very small fee (~$10) you can store your bookmarks, tag them, and even save copies of the web pages as they were when you viewed them- this comes in particularly handy if you use a website for research and it might mysteriously disappear without warning. My favorite thing about Pinboard is it isn’t mucked up with ads and other visual distractions.

Internet archive — The church meant for worship of all things digital: The Internet Archive. From Flickr by evan_carroll