(index page)
Survey says…
A few weeks ago we reached out to the scientific community for help on the direction of the DCXL project. The major issue at hand was whether we should develop a web-based application or an add-in for Microsoft Excel. Last week, I reported that we decided that rather than choose, we will develop both. This might seem like a risky proposition: the DCXL project has a one-year timeline, meaning this all needs to be developed before August (!). As someone in a DCXL meeting recently put it, aren’t we settling for “twice the product and half the features”? We discussed what features might need to be dropped from our list of desirables based on the change in trajectory, however we are confident that both of the DCXL products we develop will be feature-rich and meet the needs of the target scientific community. Of course, this is made easier by the fact that the features in the two products will be nearly identical.

How did we arrive at developing an add-in and a web app? By talking to scientists. It became obvious that there were aspects of both products that appeal to our user communities based on feedback we collected. Here’s a summary of what we heard:
Show of hands: I ran a workshop on Data Management for Scientists at the Ocean Sciences 2012 Meeting in February. At the close of the workshop, I described the DCXL project and went over the pros and cons of the add-in option and the web app option. By show of hands, folks in the audience voted about 80% for the web app (n~150)
Conversations: here’s a sampling of some of the things folks told me about the two options:
- “I don’t want to go to the web. It’s much easier if it’s incorporated into Excel.” (add-in)
- “As long as I can create metadata offline, I don’t mind it being a web app. It seems like all of the other things it would do require you to be online anyway” (either)
- “If there’s a link in the spreadsheet, that seems sufficient. (either) It would be better to have something that stays on the menu bar no matter what file is open.” (Add-in)
- “The updates are the biggest issue for me. If I have to update software a lot, I get frustrated. It seems like Microsoft is always making update something. I would rather go to the web and know it’s the most recent version.” (web app)
- Workshop attendee: “Can it work like Zotero, where there’s ways to use it both offline and online?” (both)
Survey: I created a very brief survey using the website SurveyMonkey. I then sent the link to the survey out via social media and listservs. Within about a week, I received over 200 responses.
Education level of respondents:
Survey questions & answers:
So with those results, there was a resounding “both!” emanating from the scientific community. First we will develop the add-in since it best fits the needs of our target users (those who use Excel heavily and need assistance with good data management skills). We will then develop the web application, with the hope that the community at large will adopt and improve on the web app over time. The internet is a great place for building a community with shared needs and goals– we can only hope that DCXL will be adopted as wholeheartedly as other internet sources offering help and information.
Hooray for Progress!
Great news on the DCXL front! We are moving forward with the Excel add-in and will have something to share with the community this summer. If you missed it, back in January the DCXL project had an existential crisis: add-in or web-based application? I posted on the subject here and here. We spent a lot of time talking to the community and collating feedback, weighing the pros and cons of each option, and carefully considering how best to proceed with the DCXL project.
And the conclusion we came to… let’s develop both!
Comparing web-based applications and add-ins (aka plug-ins) is really an apples-and-oranges comparison. How could we discount that a web-based application is yet another piece of software for scientists to learn? Or that an add-in is only useful for Excel spreadsheets running a Windows operating system? Instead, we have chosen to first create an add-in (this was the original intent of the project), then move that functionality to a web-based application that will have more flexibility for the longer term.

The capabilities of the add-in and the web-based application will be similar: we are still aiming to create metadata, check the data file for .csv compatibility, generate a citation, and upload the data set to a data repository. For a full read of the requirements (updated last week), check out the Requirements page on this site. The implementation of these requirements might be slightly different, but the goals of the DCXL project will be met in both cases: we will facilitate good data management, data archiving, and data sharing.
It’s true that the DCXL project is running a bit behind schedule, but we believe that it will be possible to create the two prototypes before the end of the summer. Check back here for updates on our progress.
SOS to Scientists: Help!
We are in the final stages of deciding how to proceed with the DCXL project, and we are still unsure what will work best for scientists: add-in for Excel or web-based application? (For a full comparison check out my previous blog post).
What the debate really boils down to is this: what will help scientists more? Which of the two options is most likely to foster good scientific data stewardship?
If you are a scientist, please (pretty please) take this VERY short survey on SurveyMonkey.com and help us decide what will work for most scientists most of the time.
Survey link: http://www.surveymonkey.com/s/KJHNVYC

What’s the Deal with .xlsx?
A few years back, Microsoft Excel started automatically saving my spreadsheet files with the extensions .xlsx. I first noticed it when I got a new laptop for my postdoc at University of Alberta. Suddenly, I had to be cognizant of the fact that if I left Excel to its own devices, the spreadsheets I generated would not be readable on my home computer equipped with an older version of Excel.
First, let’s cover exactly what that extra “x” is for. The additional “x” in Excel file extensions stands for XML. XML is Extensible Markup Language, which is a markup language useful for data, databases, and data-related applications. The file type .xlsx is a combination of XML architecture and ZIP compression for size reduction. Here’s a succinct summary from mrexcel.com:
If you’ve ever looked at the “View Source” view of a webpage in Notepad, you are familiar with the structure of XML. While HTML allows for certain tags, like TABLE, BODY, TR, TD, XML allows for any tags. You can make up any sort of a tag to describe your data.
You can also check out Microsoft’s description of XML in Excel. What all of this means is that .xlsx files are more generalized and easier to use with web-based applications. It’s a good thing!

You might be asking yourself why I’m writing about .xlsx. Isn’t this an old issue that folks have figured out by now? The answer to that is yes and no. Many of the scientists I have spoken with over the last few months are entrenched in their current Excel version, and have major complaints about moving to newer versions. Excel 2003 (2004 for Mac) is still heavily used among some groups, which predates the .xlsx file type. Other scientists have moved on to later versions of Excel, but still have colleagues, advisors, or collaborators who use older versions and therefore cannot open the .xlsx file type. So while many scientists can tell you they have noticed the new extension on their Excel files, they don’t understand the underlying changes.
Of course, you can tell Excel to generate and save files in the old .xls format by going to the “Excel Options… Save” and changing your settings so files are saved as .xls:
Or on a Mac, the “Preferences…. Compatibility” menu:
The Good & Bad: Web Application versus Add-in
If you missed it, I recently posted about the future direction of the DCXL project. I boiled it down to the question of Add-in versus web application. The community has offered feedback, and some major themes that have emerged, which I summarize below. But first, a reminder of the goods and bads of our two possible approaches:
| Web application | |
| Good | Bad |
| Easier to maintain, update | Requires learning new user interface |
| Use with any platform (Mac, Windows, Linux, …) | |
| Generalizable/extensible | Not integrated into Excel |
| Community involvement easier | Offline use may be limited |
| Excel Add-in | |
| Good | Bad |
| Integrated in workflow | Windows only |
| Familiar user interface & functionality | Install & updates required |
| Smaller shift in practice | Not as generalizable/extensible |
| Available offline | Not as easy for community to get involved in development, improvement |
It seems that there are strong feelings on both sides of this issue. The majority are excited about the web application, but there are some serious concerns about going whole hog into the web application realm. Most of this apprehension stems from two major issues: potential problems when offline, and the lack of a visible DCXL presence in the Excel program.
Offline use: Metadata is best collected at the time the data are collected, which means the scientist might not have an internet connection. We should make sure that any features associated with generating metadata are available offline.
DCXL presence within Excel:what if we devise a way to connect the Excel user directly to the web application from within Excel? A “Lite” version of the add-in?

If we assume that we can tackle the two problems above, then the web application might be a great direction to take. The DCXL project should focus on assisting scientists with metadata generation first, and connection to repositories second. Both of these tasks may be easier with a web application. Metadata generation could be aided by connecting to existing metadata schema and standards, which would be enabled by a generalizable API making connection easier. More interesting is the possibility for connecting with repositories and institutions; what if there was a repository-specific implementation of the DCXL web application for each interested repository? Or a DCXL web application specifically geared towards the Geology department at UC Riverside? The possibilities for connecting with existing services becomes more interesting if web connections are made easy.
Needless to say, we still want feedback from the community. Decisions will be made soon, so drop me an email or comment on the blog to make your voice heard.
Help Wanted: Add-in versus Web Application?
I recently updated this site with a page listing the DCXL Requirements. These five requirements are the basic feature set and capabilities we would like have for the Excel Add-in that is to be developed in the course of the project. The engineering team at Microsoft Research checked out our requirements and had a (rather surprising) suggestion: instead of an add-in, they recommended a web-based application.
Add-ins are little pieces of software that you can download to extend the capabilities of a program – in our case, Microsoft Excel. Synonyms for add-inare plug-in and add-on. They are downloaded, installed, and then appear within a specific program. An add-in for Excel would appear in the Excel “ribbon”, and would add new features to Excel.
A web-based application is something a bit different. It’s a software system designed to support “machine-to-machine interaction over a network”. Web applications require the web (shocking, I know) and do not require that you download a program. Instead, you use an internet connection and the web-based application. Basically, these are web sites that do more than just display information – they do something with the information or files provided by the user, on the user’s behalf. Web sites such as Facebook, YouTube, and SkyDrive are examples of web applications.
So I turn to you, community: what are your thoughts on this? Make your voice heard! You can email me directly, comment on the blog below, or come on down to CDL‘s Downtown Oakland office and tell me in person. But please comment quickly – this decision needs to be made soon. You can also vote using the quick poll in the sidebar to the right of this post. We want to know what you think!
To help you formulate intelligent comments, here’s a rough comparison of the two options:
Add-in: The user would download the add-in for use on the current machine. They could perform the above tasks via a new “ribbon” that appears at the top of the Excel window. They would be able to perform the above tasks on their current spreadsheet.
Web application: The user would go to the website hosting the web application. They would upload (drag-and-drop) a spreadsheet to the site. They could then perform the above tasks to the spreadsheet. The spreadsheet could then be downloaded back onto their PC.
| Office Add-In | Web-Based Application |
|
| Platform Compatibility | Windows only | Any |
| Spreadsheet compatibility | Different add-in for each Excel version | One application covers multiple versions; potential future expansion to SQL, CSV, XML, Open Office, GoogleDocs etc |
| Download necessary? | Yes | No |
| Software updates | Fixed bugs require download & re-install | No download/re-install necessary |
| Cloud-based? | No | Yes |
| Offline use? | Yes | No; potential future for HTML5 and offline use |
| Languages | C#/.NET C/C++ | HTML/JavaScript C#/ASP.NET |
| Has all the functionality of Excel | Yes | No |
And here are the basic capabilities we want, regardless of which of the two options above becomes a reality:
- Must work for Excel users without the add-in
- No additional software (other than add-in and Excel) necessary
- Can be used offline
- Perform CSV compatibility checks, reporting, and automated fixes
- Add Metadata to data file
- Can use existing metadata as a template
- Add-in can automatically generate some of the metadata where the info is available from the file
- Generate a citation for the data file
- Deposit data and metadata in a repository
Download the complete requirements as a PDF: DCXL Requirements






