(index page)
SOS to Scientists: Help!
We are in the final stages of deciding how to proceed with the DCXL project, and we are still unsure what will work best for scientists: add-in for Excel or web-based application? (For a full comparison check out my previous blog post).
What the debate really boils down to is this: what will help scientists more? Which of the two options is most likely to foster good scientific data stewardship?
If you are a scientist, please (pretty please) take this VERY short survey on SurveyMonkey.com and help us decide what will work for most scientists most of the time.
Survey link: http://www.surveymonkey.com/s/KJHNVYC

What’s the Deal with .xlsx?
A few years back, Microsoft Excel started automatically saving my spreadsheet files with the extensions .xlsx. I first noticed it when I got a new laptop for my postdoc at University of Alberta. Suddenly, I had to be cognizant of the fact that if I left Excel to its own devices, the spreadsheets I generated would not be readable on my home computer equipped with an older version of Excel.
First, let’s cover exactly what that extra “x” is for. The additional “x” in Excel file extensions stands for XML. XML is Extensible Markup Language, which is a markup language useful for data, databases, and data-related applications. The file type .xlsx is a combination of XML architecture and ZIP compression for size reduction. Here’s a succinct summary from mrexcel.com:
If you’ve ever looked at the “View Source” view of a webpage in Notepad, you are familiar with the structure of XML. While HTML allows for certain tags, like TABLE, BODY, TR, TD, XML allows for any tags. You can make up any sort of a tag to describe your data.
You can also check out Microsoft’s description of XML in Excel. What all of this means is that .xlsx files are more generalized and easier to use with web-based applications. It’s a good thing!

You might be asking yourself why I’m writing about .xlsx. Isn’t this an old issue that folks have figured out by now? The answer to that is yes and no. Many of the scientists I have spoken with over the last few months are entrenched in their current Excel version, and have major complaints about moving to newer versions. Excel 2003 (2004 for Mac) is still heavily used among some groups, which predates the .xlsx file type. Other scientists have moved on to later versions of Excel, but still have colleagues, advisors, or collaborators who use older versions and therefore cannot open the .xlsx file type. So while many scientists can tell you they have noticed the new extension on their Excel files, they don’t understand the underlying changes.
Of course, you can tell Excel to generate and save files in the old .xls format by going to the “Excel Options… Save” and changing your settings so files are saved as .xls:
Or on a Mac, the “Preferences…. Compatibility” menu:
The Good & Bad: Web Application versus Add-in
If you missed it, I recently posted about the future direction of the DCXL project. I boiled it down to the question of Add-in versus web application. The community has offered feedback, and some major themes that have emerged, which I summarize below. But first, a reminder of the goods and bads of our two possible approaches:
Web application | |
Good | Bad |
Easier to maintain, update | Requires learning new user interface |
Use with any platform (Mac, Windows, Linux, …) | |
Generalizable/extensible | Not integrated into Excel |
Community involvement easier | Offline use may be limited |
Excel Add-in | |
Good | Bad |
Integrated in workflow | Windows only |
Familiar user interface & functionality | Install & updates required |
Smaller shift in practice | Not as generalizable/extensible |
Available offline | Not as easy for community to get involved in development, improvement |
It seems that there are strong feelings on both sides of this issue. The majority are excited about the web application, but there are some serious concerns about going whole hog into the web application realm. Most of this apprehension stems from two major issues: potential problems when offline, and the lack of a visible DCXL presence in the Excel program.
Offline use: Metadata is best collected at the time the data are collected, which means the scientist might not have an internet connection. We should make sure that any features associated with generating metadata are available offline.
DCXL presence within Excel:what if we devise a way to connect the Excel user directly to the web application from within Excel? A “Lite” version of the add-in?

If we assume that we can tackle the two problems above, then the web application might be a great direction to take. The DCXL project should focus on assisting scientists with metadata generation first, and connection to repositories second. Both of these tasks may be easier with a web application. Metadata generation could be aided by connecting to existing metadata schema and standards, which would be enabled by a generalizable API making connection easier. More interesting is the possibility for connecting with repositories and institutions; what if there was a repository-specific implementation of the DCXL web application for each interested repository? Or a DCXL web application specifically geared towards the Geology department at UC Riverside? The possibilities for connecting with existing services becomes more interesting if web connections are made easy.
Needless to say, we still want feedback from the community. Decisions will be made soon, so drop me an email or comment on the blog to make your voice heard.