(index page)

DataUp is Live!

party girls — We are celebrating. From Boston Public Library via Flickr.

That’s right: DataUp is LIVE! I’m so excited I needed to type it twice. So what does “DataUp is Live!” mean? Several things:

The DataUp website (dataup.cdlib.org) is up and running, and is chock full of information about the project, how to participate, and how to get the tool (in either web app or add-in form).
The DataUp web application is up and running (www.dataup.org). Anyone with internet access can start creating high-quality, archive-ready data! Would you rather use the tool within Excel? Download the add-in instead (available via the main site).
The DataUp code is available. DataUp is an open source project, and we strongly encourage community members to participate in the tool’s continued improvement. Check out the code on BitBucket.
The special repository for housing DataUp data, ONEShare, is up and running. This new repository is a special instance of the CDL’s Merritt Repository, and is connected to the DataONE project. ONEShare is the result of collaborations between CDL, University of New Mexico, and DataONE. Read more in my blog post about ONEShare.
Please note that the current version of DataUp is Beta: this means it’s a work in progress. We apologize for any hiccups you may encounter; in particular, there is a known issue that currently prevents spreadsheets archived via DataUp from appearing in DataONE searches.

Today also marks the integration of the old DCXL/DataUp blog with the Data Pub Blog. You probably noticed that they are combined since the banner at the top says “Data Pub”. I will be posting here from now on, rather than at dataup.cdlib.org. The DataUp URL now hold the DataUp main website. Read more about these changes in my blog post about it. The Data Pub Blog is intended to hold “Conversations About Data”. That means we will run the gamut of potential topics, including (but not limited to) data publication, data sharing, open data, metadata, digital archiving, etc. etc.. There are likely to be posts from others at CDL from time to time, which means you will have access to more than just my myopic views on all things data.

The DataUp project’s core team included yours truly, Patricia Cruse (UC3 Director), John Kunze (UC3 Associate Director), and Stephen Abrams (UC3 Associate Director). Of course, no project at CDL is an island. We had SO MUCH help from the great folks here:

DataUp Website: Eric Satzman, Abhishek Salve, Robin Davis-White, Rob Valentine, Felicia Poe
DataUp Communications: Ellen Meltzer (DataUp Press Release PDF)
DataUp development: Mark Reyes, David Loy, Scott Fisher, Marisa Strong
Machine configuration: Joseph Somontan
Administrative support: Beaumont Yung, Rondy Epting-Day, Stephanie Lew

Thanks to all of you!

Counting Down Plus DataUp Webinar

celebration — Next week: The CDL DataUp team will be performing “Celebration” at a karaoke bar (undisclosed location).

We are nearing the (revised) launch date for DataUp: on Tuesday 2 October, one week from today, we plan on officially release the tool. This includes the DataUp website, the code, and the ability to download the add-in. Of course, you never know what the next week will bring. We aren’t promising these will be delivered on Tuesday, but we will do our very best!

Last week at the annual DataONE All Hands Meeting, I presented a demonstration of DataUp and showcased its capabilities for assisting in good data stewardship practices. DataUp was met with much excitement, especially from the Citizen Science Working Group (technically called the PPSR group, which stands for Public Participation in Scientific Research). The PPSR folks were very excited about shaping DataUp to be something that will help their data contributors to submit high quality, well-documented data. This is one of the many extensions for which DataUp is ripe; others include its integration with repositories other than ONEShare.

If you would like a guided introduction and walk-through of the tool, mark your calendar for the DataUp webinar, scheduled for Wednesday 3 October. You need to pre-register for the webinar to receive the connection information. If you can’t make the webinar, don’t fret: we will record it and make it available afterward on the UC3 webinar page.

Have Patience

work in progress — From Flickr by London Permaculture

Like all good projects, DataUp hit a few snags near the finish line. As a result, the DataUp launch will not take place today, as described in last week’s post. We have rescheduled the launch for two weeks from today. Stay tuned!

Did you notice? We tidied up.

If you didn’t notice, check out the URL above for this post: unbeknownst to you, you have been rerouted from DataUp to Data Pub. If you are still reeling from our first change (DCXL to DataUp), we apologize. Keep in mind, however, that change is good. Turn and face the strain.

The newest move is a harbinger of many changes that are coming up in the next eight days: on September 18, we will be releasing the DataUp tool! In preparation for this release, a little housekeeping needed to be done:

vacuum — It’s time for DataUp housekeeping! From Flickr by clotho98

First, we created a lovely new website for DataUp (hat tip to the crackerjack team of user experience design folks here at the California Digital Library). The new website will have all of the bells and whistles needed to fully enjoy DataUp: links to the add-in, the web application, users guides and documentation, and the code to name a few. Where should this website live? At dataup.cdlib.org, of course! But this requires a bit of musical chairs. So…

We are moving the DataUp blog (formerly the DCXL blog) to the Data Pub URL (datapub.cdlib.org). The CDL already has a blog residing at this URL, however it is in dire need of sustenance. And let’s face it: although they are all data-related, many of the blog posts you’ve read here are not specific to the DataUp project. So as of now, Data Pub will be the official blog for all things data-related at CDL, but not exclusively related to DataUp. It will be written by yours truly (with the occasional guest post), so if you are hungry for more blog content with tenuous links to music and pop culture, then re-bookmark now.

On Tuesday next week, check out the new dataup.cdlib.org website. Stay tuned for the announcement blog post, found here on Data Pub! This URL/website will be re-branded Data Pub on Tuesday next week.

SUSTAINING DATA

Last week, folks from DataONE gathered in Berkeley to discuss sustainability (new to DataONE? Read my post about it). Of course, lots of people are talking about sustainability in Berkeley, but this discussion focused on sustaining scientific data and its support systems. The truth is, no one wants to pay for data sustainability. Purchasing servers and software, and paying for IT personnel is not cheap. Given our current grim financial times, room in the budget is not likely to be made. So who should pay? Let’s first think about the different groups that might pay.

Private foundations
Public agencies (e.g., NSF, NIH)
Institutions
Professional societies and organizations
Researchers

Although the NSF provides funds for organizations like DataONE to develop, they are not interested in funding “sustainability”. They are in the business of funding research, which means that come 2019 when NSF funding ends for DataONE, someone else is going to have to pick up the tab.

Any researcher (including myself) will tell you that the thought of paying for data archiving and personnel is not appealing. Budgets are already tight in proposals (which have record low acceptance rates); combine that with the lack of clarity about data management and archiving costs, and researchers are not eager to take on sustainability.

Many researchers see data sustainability as the domain of their institutions: providing data management and archiving services in bulk to their faculty would allow institutions to both regulate how their researchers handle their data, and remove the guesswork and confusion for the researchers themselves. However with budget crises plaguing higher education due to rising costs and decreasing revenue, this is not a cost that institutions are likely to take on in the near future.

Money_1973 — Obviously I was going to reference Pink Floyd for this post on money… From Wikipedia.

Lack of funds for critical data infrastructure is a systematic problem, and DataUp is no exception. Although we have funds to promote DataUp and publish our findings in the course of the project, we do not have funds to continue development. There is also the question of storage for datasets. Storage is not free, and we have not yet solved the problem of who will pay in the long term for storing data ingested into the ONEShare repository via DataUp.

Now that I’ve completed this post, it seems rather bleak. I am confident, however, that we have the right people working on the problem of data sustainability. It is certainly a critical piece in the current landscape of digital data.

Love Pink Floyd AND The Flaming Lips? Check out the FL cover album for Dark Side of the Moon, including a spectacular version of “Money”.